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INTRODUCTION 

This handbook describes how the Am29000^" Stream- 
lined Instruction Processor can be used in graphics ap- 
plications. It presents primitive graphics functions for 
2-D and 3-D rendering of raster displays and evaluates 
the performance of the Am29000 using standard graph- 
ics benchmarks. It also discusses additional hardware 
and software techniques that can be used to enhance 
performance. The example programs use low- level rou- 
tines that can be used to port the standard graphics li- 
braries, such as G.K.S., PHIGS, or X-Windows. 

It is assumed that the reader is familiar with the 
Am29000's architecture, the calling conventions of 
the C-language, and the software management of the 
Am29000's stack cache. 

For the convenience of the reader, the two header files 
that contain the basic definitions for the example pro- 
grams are given in Appendix A. Excerpts from the list- 
ings are used throughout this document to illustrate the 
explanations. 

Listings for all the 2^\Q^ Family Graphics Primitives 
example programs are available (at no charge) on a 
single 5.25" DSHD floppy diskette. To obtain this 
diskette, please contact your local sales office (listed in 
the back of this publication) or call the 29K Hotline at 
1-800-2929AMD in the USA. 



Suggested Reference Materials 

Consult the following AMD reference materials for more 
information on the topics covered in this handbook (also 
see Bibliography): 

• Am29000 User's Manual (order # 1 0620) . This docu- 
ment contains details on the instruction set and regis- 
ter organization of the Am29000. 

• 29K Family Data Book (order # 12175). This docu- 
ment contains a great deal of technical information 
about the Am29000, including distinctive characteris- 
tics, a general and a functional description, the system 
diagram, connection diagram, pin designations and 
descriptions, absolute maximum ratings, operational 
ranges, DC characteristics, switching characteristics 
and waveforms, and physical dimensions. 

• Am29000 Memory Design Handbook (order # 1 0623) . 
This handbook provides Am29000-memory-system 
design information and specific examples that will be 
helpful in determining how to design a memory sys- 
tem for the best cost/performance ratio available to fit 
your Am29000 application. 

The above mentioned reference materials can be 
obtained by writing or calling: 

Advanced Micro Devices, Inc. 
901 Thompson Place 
P.O. Box 3453 
Sunnyvale, CA 94088-3453 
1-800-222-9323 
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Bit Maps 

A bit map is a two-dimensional array of pixels used to 
contain the information presented on the graphics 
screen. The bit map, or some part of it, is read in sync 
with the screen raster. The data obtained is used to re- 
fresh the screen at a high enough rate to avoid flicker. 
Figure 1 shows the relationship between a bit map and a 
CRT screen. 

The bit map often contains more than one bit of menrx)ry 
for each location on the screen (pixel) , which allows gray 
scale or color to be displayed. The algorithms described 
in this handbook are documented for monochrome and 
32-blt pixels. They can be easily modified for 8 or 1 6 bits 
per pixel. 



The words in a monochrome bit map appear as shown in 
Figure 2. There are 32 pixels per word, with bit numbers 
decreasing from left to right. 

With 2 to 8 bits per pixel, each byte is 1 pixel. The pixel 
contained in bits 31 through 24 appears to the left of the 
pixel contained in bits 23 through 16, and so forth. 
All bits in a pixel have a common byte address. Higher- 
numbered bits are more significant than numbered bits. 

With 9 to 16 bits per pixel, each half-word is one pixel. 
The pixel contained in bits 31 through 1 6 appears to the 
left of the pixel contained in bits 15 through 0. All bits in 
a pixel have a comnrx)n half-word address. Again, 
higher-numbered bits are more significant than lower- 
numbered bits. 




CRT Screen 



Figure 1. Biti\/lap 
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With 1 7 to 32 bits per pixel, each word contains 1 pixel. 
All bits in a pixel have a common word address. Again, 
higher-numbered bits are more significant than lower- 
numbered bits. A bit map with 32-bit pixels Is shown in 
Figure 3. 

Lower-addressed words always appear on the screen to 
the left of, and above, higher-addressed words. Higher- 
addressed words appear on the screen to the right of, 
and below, lower-addressed words. 

The algorithms used in this handbook assume that the 
bit-map memory has byte-write capability. If byte-write 
is not implemented, the algorithms will need extensive 
modification for 2 and 4 pixels-per-word and will execute 
considerably slower. 

RENDERING 

The sample programs for vector, copy block, string, and 
filled triangle are C-language callable routines written in 
ASM29K assembly language. 

Common Files 

There are two common .h files, G29K_REG.H and 
GRAPH29K.H, that define registers and structures for 
the example programs. These files are contained in Ap- 
pendix A of this document. 

G29K_REG.H 

G29K__REG.H defines the register names and trap defi- 
nitions that are used by the example programs. Local 
register usage is summarized in Table 1 . 



Table 1. Local Register Assignments in 
G29K REG.H 



Function 


Registers 


Input parameters from G29K_Params 


Ir20 - Ir2 


S_M1_01 parameters (used by all) 


Ir24 - Ir21 


Line internal variables 


Ir54 - Ir25 


Block internal variables 


Ir110- Ir25 


Text internal variables 


Ir35 - Ir25 


Shading internal variables 


Ir72 - Ir25 



The file G29K_REG.H also specifies the programmed 
traps for spill and fill, and two traps for clipping: 

.equ V_SPILL 64 

.equ V_FILL, 65 

.equ V_CLIP_SKIP, 100 

.equ V_CLIP_STOP, 101 

The size of a pixel (in bytes) is defined: 

.equ PIXEL_SIZE, 4 

Five constants define the number of registers that must 
be claimed by the prologue of each function: 

.equ LINE_PRIMITIVE, 53 

.equ BLOCK_PRIMITIVE, 10 9 

.equ TEXT_PRIMITIVE, 4 6 

.equ SHADE_PRIMITIVE, 71 

.equ FILL_PRIMITIVE, 71 
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Finally, five macros are defined. The macro ENTER de- 
fines three management symbols. The macro PARAM 
defines a local-register symbol. Neither ENTER nor 
PARAM generates any code. The macro CLAIM gener- 
ates the standard C-language calling convention pro- 
logue code. The macros RELEASE and LEAVE 
generate the standard C-language calling convention 
epilogue code. 

Table 2 shows the registers used in vector routines. This 
list can be used as a guide when deleting registers in or- 
der to reduce the value of LINE_PRIMITIVE. Before any 
registers are deleted, the source code to be retained 
should be searched for any usage of the deleted regis- 
ters. Note that any local registers that are "above" 
(higher than) those deleted must be renumbered. 

Another candidate for reduction is BLOCK_PRIMITIVE. 
There are two approaches that might be useful. First, if 
only the primitives that do moves (as opposed to arbi- 
trary operations) are used, then the dst.array and its 
pointers can be removed (registers Ir65-lr33). Renum- 
bering the source array will allow BLOCK_PRIMITIVE to 



be reduced by 33. Second, the constant MAX__WORDS 
can be reduced. Doing so will permit BLOCK^PRIMI- 
TIVE values as shown In Table 3. 

GRAPH29K.H 

GRAPH29K.H defines the structure G29K_Params. 
This structure contains all parameters other than the 
fundamental items, such as end points or vertices. The 
structure matches the definitions of the global control 
parameter registers in G29K_REG.H. 

Many of the elements in the structure are not used by 
every routine. For example, the window parameters are 
not used by routines that do not clip. Table 4 shows the 
local registers assigned to structure elements and their 
usage. 

If certain functions will never be used, the storage 
unique to those functions can be removed from the 
structure, from the register definitions, and from the ap- 
propriate ^PRIMITIVES definition in REG29K.H. 
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Table 2. Local Registers Used In Vector Routines 


Variable 


Reg 1^01 1_02 2_01 2_02 3^01 


3^02 4_01 



GP.wnd_miny 


lr2-5 




GP.pxl_value 


Ir6 


X 


GP.mem width 


Ir7 


X 


GP.mem_depth 


Ir8 


X 


GP.wnd base 


Ir9 


X 


GP.wnd_align 


ino 


X 


GP.pxl_op_vector 


Ir11 




GP.pxlJn_mask 


in 2 




GP.pxl_do_mask 


in 3 




GP.pxl_do_value 


in 4 




GP.pxLout_mask 


ins 




GP.wid actual 


me 




GP.pxl_op_code 


in? 




GP.mem base 


ma 




GP.wnd origin x/y 


in9,20 




LP.Ioc.x 


Ir21 


X 


LP.Ioc.y 


Ir22 


X 


LP.Ioc.addr 


Ir23 


X 


LP.Ioc.align 


Ir24 


X 


LP.wid.axial 


Ir25 




LP.wid.side 1 


Ir26 




LP.wid.side 2 


Ir27 




LP.gen.cover 


ir28 




LP.gen.delta_p 


Ir29 


X 


LP.gen.delta_s 


IrSO 


X 


LP.gen.move__p 


Ir31 


X 


LP.gen.move_s 


Ir32 


X 


LP.gen.p 


Ir33 




LP.gen.s 


Ir34 




LP.gen.mlnjD 


Ir35 




LP.gen.max_p 


Ir36 




LP.gen.min_s 


Ir37 




LP.gen.max_s 


Ir38 




LP.gen.slope 


Ir39 


X 


LP.gen.x_slope 


Ir40 


X 


LP.gen.error 


Ir41 


X 


LP.gen.x_error 


Ir42 




LP.gen.addr 


Ir43 




LP.gen.try_s 


Ir44 




LP.gen.count 


Ir45 


X 


LP.clp.skip_vec 


Ir46 




LP.clp.stop_vec 


Ir47 




LP.clp.others 


Ir48~54 





X 
X 
X 
X 
X 
X 



X 
X 
X 
X 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



X 
X 
X 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



X 
X 
X 
X 



X 
X 
X 
X 



X 
X 
X 

X 

X 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



X 
X 
X 
X 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 

X 
X 
X 
X 
X 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



X 
X 
X 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



X 
X 
X 
X 



X 
X 
X 
X 



X 
X 
X 
X 
X 



X 
X 
X 



Table 3. Permitted Values for MAX WORDS and Associated BLOCK PRIMITIVE Values 



MAX WORDS 



BLOCK PRIMITIVE 



16 
8 
4 



109-32 
109-48 
109-56 
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Table 4. Local-Register Usage In GRAPH29K.H 


Name 


Reg What Where (not) used 



wnd_min/max_x/y 


lr2-5 


Clipping window 


Used only for clipping 


pxl_value 


Ir6 


Current color 


Used by all but bitbit 


mem_width 


Ir7 


Scan line size 


Used by all 


mem_depth 


Ir8 


Encoded bits per pix 


Used by all 


wnd_bcise 


Ir9 


SeeS_M1_01.S 


Used by all 


wnd_align 


ino 


SeeS_M1_01.S 


Used by all 


pxl_op_vector 


Ir11 


Address of routine 


Used by arbitrary op 


pxUn_mask 


in 2 


Source input mask 


Used by arbitrary op 


pxLdo_mask 


Ir13 


Source accept mask 


Used by arbitrary op 


pxl_do_value 


Ir14 


Source accept value 


Used by arbitrary op 


pxLout _mask 


in 5 


Dest output mask 


Used by arbitrary op 


wid_actual 


in 6 


Line width 


Used by AA, wide lines 


pxLop_code 


in? 


Unused 




mem_base 


ins 


Unused 




wnd_origin_x/y 


in 9,20 


Unused 





Vector Routines 

There are seven complete vector routines, as well 
as some common subroutines. The reason for having 
several routines for a type of function (e.g., for drawing 
lines) Is that each routine can be optimized for a specif- 
ic set of circumstances. It Is strongly recommended 
that the reader study the simplest vector routine 
(P_L1_01.S) first. All the vector routines are listed in 
Table 5. 

Table 5. Vector Routines 



Routine 



Function 



P_L1_01 .S Single width, set only, not clipped 
P_L1_02.S Single width, set only, clipped 

P„L2_01 .S Single width, general operation, not 

clipped 
P_L2_02.S Single width, general operation, 

clipped 
P_L3_01 .S Anti-allased, wide lines, not clipped 
P_L3_02.S Anti-aliased, wide lines, clipped 
P_L4_01 .S Monochrome, single width, general 

operation, not clipped 
P_L5_01 .S Single width, set only, not clipped, 

fixed-width map 

All line-drawing routines begin with the nomial global 
functions. The function name Is declared to be global, 
the ENTER macro is used to specify that 54 general reg- 
isters are required, and the routine label occurs: 

.global_P_Ll_01 
ENTER LINE_PRIMITIVE 
P LI 01: 



The four parameter register names are declared with 
PARAM macros. These assign local register numbers 
above (higher than) the local registers previously 
defined. These parameters are passed in the local reg- 
isters shown below: 



l^acro 


Register Name 


Register Number 


PARAM 


Start.x 


Ir54 


PARAM 


Start.y 


Ir55 


PARAM 


Finish.x 


Ir56 


PARAM 


Finish. y 


Ir57 



The CLAIM macro Is the function prologue. If a spill 
operation is not necessary, this consists of five in- 
structions. If a spill is necessary, the standard SPILL 
routine is used, which may involve a Load/Store Multiple 
operation. 

P_L1_01.S 

This section begins with a single-width, undipped vector 
with set. "With set" means that the current drawing color 
will be deposited into each pixel location, without regard 
to the current contents of the bit map. This function as- 
sumes 32-bit pixels. 

An example of a subroutine that calls this function 
Is given in the file TEST_L1.C, which Is contained in 
Appendix A. 

Five parameters are loaded from the structure 
G29K Params: 



GP, 


.pxl. 


.value 


lr6 


GP, 


.mem. 


.width 


lr7 


GP, 


.mem. 


. depth 


lr8 


GP 


.wnd, 


.base 


lr9 


GP, 


.wnd, 


.align 


IrlO 
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const TempO,_G29K_Params + 4*4 
consth Tempo, _G29K_Pa rams + 4*4 
mtsrim cr, (5 - 1) 
loadm 0, 0,GP.pxl. value, Tempo 

Routine S__i^1__01 (which is contained in Appendix A) is 
called to convert the Start.x, Start.y pair to a linear ad- 
dress. The linear address is returned in LP.bc.addr. 

add LP. loc.x, Start .X, 

call ret,S_Ml_01 

add LP. loc.y, Start .y, 

The delta in the x direction is computed and put into 
LP.gen.delta.p. The suffix 'p* stands for primary. It is un- 
known at this point whether the primary direction will be 
X or y. The initial error is set correctly for reverslbly re- 
traceable lines and put into LP.gen.error. This can be 
forced to zero if such lines are not desired. The move- 
ment value for the primary direction is set to the distance 
(In bytes) between pixels In the x direction and put into 
LP.gen.movejp. If Finish.x is not greater than Start.x, 
the delta is complemented (made positive) and the 
movement value Is set to move In the negative direction. 



LP . gen . delta__p. Finish . x. Start . x 

LP.gen.error,LP.gen.delta__p, 31 

LP . gen . delta_p, L_0 1 

LP . gen .move_p, PIXEL_SIZE 

LP . gen . deltajp, LP . gen . delta_j>, 



sub 

sra 

jmpf 

const 

subr 

constn LP .gen.move_p,-PIXEL_SIZE 

At label L_01 , a similar set of calculations is done as- 
suming that y will be the secondary direction. The 
delta in the y direction is calculated and loaded into 
LP.gen.delta.s. The suffix 's' stands for secondary. The 
complement of the linear distance between pixels in 
the y direction is loaded into LP.gen.movejs. This is the 
width of the bit map in bytes. This will be combined with 
the primary movement to obtain a combined move- 
ment. If Finish.y is not greater than Start.y, the delta 
is complemented and the true value of the memory 
width is loaded into the secondary-movement variable, 
LP.gen.move_s. 



_01: 
sub 
jmpf 
subr 
subr 
add 



LP . gen. delta_s, Finish.y, Start .y 
LP . gen . de It a_s , L_0 2 
LP .gen.move_s,GP .mem. width, 
LP . gen . delta_s , LP . gen . delta_s , 
LP . gen . move_s , GP . mem . width , 



At label L_02, the code decides whether the primary 
movement is, in fact, in the x direction. The secondary 
movement value is combined with the primary move- 
ment value, and the result is put into LP.gen.move_s. If 
delta__p Is not greater than or equal to delta_s, then the 
primary direction is y. The values In LP.gen.delta_p and 



LP.gen.delta_s are swapped with three exclusive-or 
(XOR) operations, and a corrected primary-movement 
value is computed. 



L_02: 
cpge 

jmpt 
add 



sub 



Tempo , LP . gen . delta_j), 

LP . gen . delta_s 

TempO,L_03 

LP .gen.move_s,LP .gen.move_p, 

LP . gen . mo ve_s 

LP .gen.delta_j5,LP .gen.delta_p, 

LP . gen . delta_s 

LP . gen . delta_s , LP . gen . delta_p, 

LP . gen . delta_s 

LP . gen . delta_p, LP . gen . delta_p, 

LP . gen . delta_s 

LP . gen . mo ve_p , LP . gen . mo ve_s , 

LP . gen . move__p 



At label L_03, the code determines which half of the first 
octant contains the line to be drawn. This Is shown in 
Figure 4. 



First Octant 




Lower Half 
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Figure 4. Partial Octant 



L_03: 
sub 

cpgeu 

jmpt 

sub 

add 

xor 



Tempi , LP . gen . delta_jp, 

LP . gen . delta_s 

Tempo, Tempi, LP .gen. del ta_s 

Temp0,L_04 

LP . gen . count , LP . gen . delta__p, 1 

LP. gen. delta_s. Tempi, 

LP . gen . mo ve_p , LP . gen . move_p , 

LP .gen.move_s 

LP . gen . mo ve_s , LP . gen . mo ve__p , 

LP . gen . move_s 

LP . gen . mo ve_p , LP . gen . mo ve_p , 

LP . gen . mo ve_s 

LP .gen. error, LP .gen. error, 



By drawing the vector as if it were always in the same 
half of an octant (that is, the bottom half), it Is possible to 
maximize the use of the Branch Target Cache™ (BTC). 
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if the first jump in the generation loop is usually taken, 
the entire routine will often execute entirely out of the 
BTC. 

The difference in the deltas is computed Into Tempi and 
is compared to the delta in the secondary direction. If the 
difference is less than the secondary direction, the line is 
in the top half of the first quadrant. This is equivalent to 
saying that more combined moves will be required than 
moves in the primary direction only. If this is the case, 
the code will adjust the parameters so that the line will be 
generated as though it were in the bottom half. In either 
case, the number of iterations necessary in the genera- 
tion loop is computed and placed Into LP.gen.count 

If the vector is in the top half of the first octant, the secon- 
dary delta is set to the difference of the deltas previously 
computed. The primary and secondary move values are 
swapped, and the initial error term is reversed. 

At label L_04, the primary and secondary error incre- 
ments are calculated. These error increments are ex- 
actly those described in Bresenham's algorithm, namely 
"2*delta_s" and "2*delta_s - 2*delta__p." The initial error 
term is calculated. It is now possible to begin to draw the 
vector. 



_04: 
sll 
sll 
sub 

add 

sub 

jmpt 
sub 



.slope, LP.gen.delta_s, 1 

.x_slope,LP .gen.delta_p, 1 

.x_slope, LP. gen. slope, 

.x_slope 

.error, LP .gen. error, 

.slope 

. error, LP . gen . error, 

.delta_j) 

.count, L_07 

. count , LP . gen . count , 1 

The variable LP.gen.count is tested to determine 
whether to enter the actual generation loop or to draw a 
single pixel; at least one pixel is always drawn. 

In the generation loop, for each pixel in the primary di- 
rection, a pixel is drawn and a direction for the next pixel 
Is chosen. There are two basic paths through the loop, 
not including the end point. The path usually taken in- 
volves movement in the primary direction only. 



LP . gen 
LP . gen 
LP . gen 
LP . gen 
LP . gen 
LP . gen 
LP . gen 
LP . gen 
LP . gen 
LP . gen 



L_05: 
jmpt 
store 

L_06: 
add 
dec 
add 



(primary) to L_06 
pixel 

primary inc to error 
count, jump L_05 
primary move to addr 



The alternate path is taken when nrxDvement in both di- 
rections is required. 

L_05: 

do not jump (secondary) 

store pixel 

add secondary inc to error 

dec count, jump L_05 

add combined move to addr 

In this case (movement in both directions), there are 
also five instructions per pixel, but the routine does not 
execute entirely out of the BTC. In a system without 
single-cycle instruction burst or with five or more cycles 
for the first instruction, executing entirely from the BTC 
is advantageous. This is because the menrx)ry cannot 
keep up with the processor. In fact, in a system with fast 
access and single-cycle burst, the routine is slightly 
slower due to the extra calculations necessary to place 
the line into the proper half of the octant. 



L_05: 
jmpt 
store 



LP . gen , 
0,0,GP, 
LP . loc , 
LP . gen , 
LP . gen , 
jmpf dec LP . gen , 
add LP . loc , 
LP . gen 
L_08 
0,0, GP 
LP . loc , 



add 



jmp 
store 



L_06: 
add 



LP . gen 

LP . gen 

jmpf dec LP .gen 

add LP . loc 

LP . gen 



error,L_0 6 

pxl .value, 

addr 

error, LP .gen. error, 

x_slope 

count,L_05 

addr, LP . loc .addr, 

move_s 

pxl. value, 
addr 

error, LP . gen . error, 

slope 

count,L_05 

addr, LP . loc . addr, 

move_p 



_07: 
store 



, , GP . pxl . value , 
LP .loc .addr 



L 08: 



This loop is five instructions per pixel executing entirely 
out of the Branch Target Cache. 



In either case, when the count has expired, the last pixel 
is drawn and the routine exits. 

RELEASE 

nop 

LEAVE 

P_L1J2.S 

This C-language callable routine draws single-width, set 
vectors with clipping. The routine begins with the normal 
global functions. 



Graphics Primitives 



Nine parameters are loaded from the stmcture 
G29K_Params. 



GP.wnd.min x 


lr2 


GP.wnd.max x 


lr3 


GP.wnd.min_y 


lr4 


GP .wnd.max__y 


IrS 


GP .pxl. value 


lr6 


GP . mem . width 


lr7 


GP. mem. depth 


lr8 


GP.wnd.base 


lr9 


GP .wnd. align 


IrlO 



const Tempo, _G29K_Params 
consth TempO,_G29K_Params 
mtsrim cr, (9-1) 
loadm 0, 0,GP .wnd. min_x, Tempo 

Routine S_l\/I1_01 is called to convert the Start.x, Start.y 
pair to a linear address. The linear address Is returned in 
LP.Ioc.addr. 



add 

call 

add 



LP . loc . x, Start . x, 
ret,S_Ml_01 
LP.loc.y, Start.y, 



The routine assumes that x will be the primary direction 
and y will be the secondary direction. The value delta_p 
is accordingly calculated from the x end points, and 
delta_s is calculated from the y end points. The routine 
calculates delta_p by subtracting Start.x from Start.y. 
The initial error is set to the sign bit of this delta, and a 
test is made to determine whether Finish.s Is greater 
than or equal to Start.s. In either case, delta_s is calcu- 
lated by subtracting Sfarf.y from Finlsh.y. 

sub LP . gen . delta_p, Finish . x, Start . x 

sra LP .gen. error, LP. gen. delta_p, 31 

jmpf LP. gen.de It a_p,L_01 

sub LP . gen. delta_s, Finish. y, Start .y 

subr LP.gen.delta_j),LP.gen.delta^, 

constn LP .gen.move_p,-PIXEL_SIZE 

subr LP. gen. p, Start .X, 

subr LP.gen.min_p,GP.wnd.max_x, 

jmp L_02 

subr LP .gen.max_j>,GP.wnd.min_x, 

If Finish.x'\s not equal to or greater than Start.s, deltajD 
is negated. The pixel offset (the move value) in the pri- 
mary direction is set to the negative of the pixel size (the 
distance between pixels in bytes in the horizontal direc- 
tion). The minimum and maximum clipping boundaries 
are set to the complement of the maximum and mini- 
mum X values of the clipping window, respectively. 

If Finish.x is greater than or equal to Start.x, the logic at 
L_01 sets the primary movement value to the distance 



between pixels (in bytes) in the x direction. The mini- 
mum and maximum clipping boundaries are set to the 
minimum and maximum x values of the clipping window, 
respectively. 



L_01: 
const 
add 
add 
add 



LP . gen . mo ve_p , P IXEL_S I ZE 
LP . gen . p, Start . x, 
LP . gen . min_p, GP . wnd . min_x, 
LP . gen . max_p, GP . wnd . max_x, 



At label L_02 the Clipping Skip Vector is set to the label 
Loop (partially). This is completed during a delay in- 
struction of a jump just after label L_04. The secondary 
delta is tested to determine whether F/n/s/7.y was equal 
to or greater than Start.y. If not, the secondary delta is 
negated, and the secondary movement value is set to 
the memory width (that is, the number of bytes in the lin- 
ear address space between vertically adjacent pixels). 
The secondary initial clipping location is set to the com- 
plement of Start.y. The minimum and maximum clipping 
boundaries are set to the negative of the maximum and 
minimum y values of the clipping window, respectively. 



L_02: 
jmpf 
const 
subr 
add 
subr 
subr 
jmp 
subr 



LP . gen . delt a_s , L_0 3 

LP . clp . skip_vec, Loop 

LP . gen . delta_s , LP . gen . delta_s , 

LP . gen . move__s , GP . mem . width , 

LP. gen. s, Start .y, 

LP . gen . min_s , GP . wnd . max_y , 

L_04 

LP . gen . max_s , GP . wnd . min_y , 



If Finish.y v/as greater than or equal to Start.y, then the 
code at label L_03 sets the secondary movement value 
to the negative of the memory width, and the secondary 
clipping initial value is set to Start.y. The minimum and 
maximum secondary clipping boundaries are set to 
the minimum and maximum y values of the window, 
respectively. 

L_03: 

subr LP.gen.move_s,GP.mem.width, 

add LP. gen. s, Start .y, 

add LP . gen . min_s , GP . wnd . min_y , 

add LP . gen . max_s , GP . wnd . max_y , 

At label L_04, the primary delta is compared to the sec- 
ondary delta. The Clipping Skip Vector is completely set 
to the label Loop. If the primary delta was greater than or 
equal to the secondary delta, then the x direction is the 
primary direction and the code continues at label L_05. 
If not, then the primary direction is, in fact, the y direc- 
tion, and some swapping is necessary. 
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The following five sets of values are exchanged: 



LP.gen.delta_p 
LP . gen . mo ve_p 
LP . gen . p 
LP . gen . min_p 
LP . gen . max_p 



LP.gen.delta_s 
LP . gen . movers 
LP . gen . s 
LP . gen . min_s 
LP . gen . max__s 



That is, the primary and secondary deltas, movement 
values, initial clipping positions, and clipping boundaries 
are exchanged. 

L_04: 

cpge TempO,LP.gen.delta_p^ 

LP . gen . delta_s 
j mpt Temp , L_0 5 
consth LP .clp.skip_vec,Loop 
xor LP .gen.delta_j3^LP.gen.delta_jp^ 

LP . gen . delta_s 
xor LP .gen.delta_s,LP .gen.delta_p, 

LP .gen.de It a_s 
xor LP .gen.delta_p,LP.gen.delta_p^ 

LP . gen . delta_s 
xo r LP . gen . mo ve_jp , LP . gen . mo ve_jp , 

LP . gen . mo ve_s 
xor LP.gen.move_s,LP .gen.move_p, 

LP . gen .move_s 
xor LP.gen.move_p,LP.gen.move_p, 

LP . gen . mo ve_s 
xor LP. gen. p, LP .gen. p, LP .gen. s 
xor LP. gen. s, LP. gen. p, LP .gen. s 
xor LP .gen. p, LP. gen. p, LP. gen. s 
xor LP.gen.min__p,LP .gen.min_p, 

LP . gen . min_s 
xor LP,gen.min_s,LP .gen.min_j>, 

LP .gen.min_s 
xor LP .gen.min_j3,LP .gen.min_p, 

LP .gen.min__s 
xor LP .gen.max_j>,LP .gen.max_j>, 

LP .gen.max_s 
xor LP .gen.max_s^LP .gen.max_p, 

LP .gen.max_s 
xor LP .gen.max_p,LP .gen.max_p, 

LP .gen.max_s 

At label L_05, the primary and secondary error incre- 
ments are calculated. These error increments are ex- 
actly those described in Bresenham's algorithm, namely 
"2*delta_s" and "2*delta_s - 2*deltaj3." The initial error 
term is calculated, and the Clipping Stop Vector Is set. 
Now the vector can be drawn. 



L_05: 
sll 
sll 
add 



LP .gen. slope, LP .gen.delta_s, 1 
LP .gen.x_slope,LP.gen.delta_p, 1 
LP . gen .error, LP . gen .error, 
LP .gen. slope 



sub LP. gen. error, LP. gen. error, 

LP . gen . delta__p 
const LP.clp.stop__vec,Stop 
consth LP.clp.stop_vec,Stop 
jmp L_08 
sub LP. gen. count, LP. gen.de It a jp,l 

Loop: 

jmpf dec LP . gen . count , L_0 6 

nop 

jmp Stop 

nop 

If the vector is a single pixel, the routine jumps to L_08, 
where it is checked against the clipping boundaries. If 
the single pixel is inside the clipping window, it will be 
drawn. 

The pixel count is calculated from the primary delta, and 
the routine falls through to label Loop. 

Depending on whether the movement will be In the pri- 
mary direction or the combined direction, the code will 
take one of two paths through the generation loop. If the 
movement is in the primary direction only, the path is as 
folk)ws: 

L_06: 

jump to L_07 

add prim inc to error 

L_07: 

add prim move to addrs 

inc primary clipping adrs 
L_08: 

assert pixel inside clipping 

dec loop count, jump L_06 

store value into addr 
Stop: 

RELEASE 

nop 

LEAVE 

This movement in the primary direction requires ten in- 
structions, including the four assert instructions that 
mechanize the clipping, plus one store per pixel. 

If the movement Is in the combined direction, the path is 
as follows: 

L_06: 

do not jump to L_07 

add prim inc to error 

sub X error incr from error 

add secondary move to addrs 

inc secondary clipping adrs 
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L_07: 

add prim move to addrs 
inc primary clipping adrs 

L_08: 

assert pixel inside clipping 
dec loop count, jump L_0 6 
store value into addr 

Stop: 

RELEASE 

nop 

LEAVE 

This movement in the combined direction requires 1 3 in- 
structions, including the four assert instructions that 
mechanize the clipping, plus one store per pixel. 

Clipping is mechanized with a set of four assert instruc- 
tions, as shown in Figure 5. 

The line Is being drawn from left to right (that is, in the 
first quadrant). The cun'ent clipping locations are initially 
set to the Start.x and Start.y values. The clipping 
boundaries are set to the clipping window values. 

When the generation loop begins, the first assert will fail 
because LP.gen.p will not be greater than or equal to 
LP.gen.min.p. Since the Clipping Skip Vector has been 
set to Loop, the routine will go back to the top and con- 
tinue with the next point along the line. In this case, the 
loop generation count is decremented at the top of the 
loop. This point is not drawn. 

When LP.gen.p has been Incremented to a value 
greater than or equal to LP.gen.min_p, the assertions 
will succeed. Now the routine will execute the store in- 



structions, and each pixel will be written. The generation 
loop count Is decremented at the bottom of the loop. 

When LP.gen.p has been Incremented to a value 
greater than or equal to LP.gen.max_p, the third asser- 
tion will fail. Since the Clipping Stop Vector points to the 
label Stop, the generation loop will exit as though the 
count were negative, and the iterations will cease. 



L_06: 
jmpt 
add 

sub 

add 

add 

L_07: 
add 

add 

L_08: 
asge 



LP .gen. error, L_07 

LP . gen . error, LP . gen . error, 

LP .gen. slope 

LP .gen. error, LP .gen. error, 

LP . gen . x_slope 

LP . loc . addr, LP . loc . addr, 

LP . gen . mo ve_s 

LP . gen . s , LP . gen . s , 1 

LP . loc . addr, LP . loc . addr, 

LP . gen . mo ve_j5 

LP .gen.p,LP .gen.p, 1 



V_CLIP_SKIP, LP.gen.p, 

LP .gen.min_p 
asge V_CLIP_SKIP,LP .gen. s, 

LP . gen . min_s 
asle V_CLIP_STOP, LP .gen.p, 

LP .gen.max_p 
asle V_CLIP_STOP,LP.gen.s, 

LP .gen.max_s 
jmpf dec LP . gen . count , L_0 6 
store 0, 0,GP .pxl. value, LP .loc. addr 



gen.max_s 



gen.min._p 



gen.maxjD 




Start.x.y 



gen.min_s 
Frame Buffer 



Figure 5. Clipping 



Finish.x.y 



11011A-05 
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stop: 

RELEASE 

nop 

LEAVE 

P_L2_01.S 

This C-language callable routine Is used to draw a 
single-width, undipped line. The writing operation can 
be any the user wishes. 

The operation (e.g., XOR, AND, ADD, etc.) is specified 
by the user by providing the address of the routine that 
performs the operation. The example user-supplied rou- 
tine, O1_02.S (included on the distribution diskette) 
XORs the current contents of the addressed pixel with 
the pixel value. 

The routine P_L2_01 .S begins with the normal global 
functions. Ten parameters are loaded from the structure 
G29K Params. 



GP .pxl. value 


lr6 


GP .mem. width 


lr7 


GP .mem. depth 


lr8 


GP .wnd.base 


lr9 


GP.wnd. align 


IrlO 


GP .pxl.op_vec 


Irll 


GP .pxl.in_mask 


lrl2 


GP .pxl.do_mask 


lrl3 


GP . pxl . do va lue 


lrl4 


GP . pxl . out_mask 


lrl5 


const Tempo, _G29K_ 


Params +4*4 


consth Tempo, _G29K_ 


Params +4*4 


mtsrim cr, (10 - 1) 




loadm 0, 0,GP. pxl. value, Tempo 



Routine S_M1__01 is called to convert the Start.x, Start.y 
pairto a linear address. The linear address is returned in 
LP.Ioc.addr. 

add LP. loo. X, Start .X, 

call ret,S_Ml_01 

add LP.loc.y, Start.y, 

The routine assumes that the x direction will be the 
primary direction, and that the y direction will be secon- 
dary. It calculates the delta for the primary direction 
by subtracting Start.x from Finish.x. The error adjust- 
ment for reversibly retraceable lines is set from the 
sign bit. The primary movement parameter is set to 
PIXEL_SIZE, which is the distance in bytes between 
horizontally adjacent pixels. If Finish.x'is greater than or 
equal to Start.x, the code jumps to L_01 . If Finish.x is 
less than Start.x, the value for the primary delta is ne- 
gated, and the primary nrx)vement parameter is set to 
negative PIXEL_SIZE. (The line will be drawn from right 
to left.) 

sub LP . gen . delta_p, Finish . x, Start . x 



sra LP .gen. error, LP .gen.delta__p, 31 

jmpf LP.gen.delta_p,L_01 

const LP.gen.move_j),PIXEL_SIZE 

subr LP . gen . delta_p, LP . gen . delta_p, 

constn LP.gen.move__p, -PIXEL_SIZE 

At label L_01 , the secondary delta is calculated by sub- 
tracting Sfarf.yf rom Finish.y. The secondary nfx)vement 
parameter is set to the negative of the memory width 
(that is, the distance between vertically adjacent pixels). 
If Finish.y is less than Start.y, the secondary delta is ne- 
gated and the secondary movement parameter is set to 
the value of memory width. 



L_01: 
sub 
jmpf 
subr 
subr 
add 



LP. gen. delta_s, Finish.y, Start .y 
LP . gen . delta_s, L_02 
LP . gen . mo ve_s , GP .mem. width, 
LP . gen.de It a_s, LP .gen.delta_s, 
LP . gen. mo ve_s,GP. mem. width, 



At label L_02, the primary and secondary deltas are 
compared. If the primary delta is greater than or equal 
to the secondary delta, then the x axis is the primary 
axis. The combined movement value is computed by 
adding the primary nnovement to the original secondary 
movement. 



L_02: 
cpge 

jmpt 
add 



sub 



Tempo, LP .gen.de It a__p, 

LP . gen . delta_s 

TempO,L_03 

LP . gen . move_s , LP . gen . move_jp, 

LP .gen.move_s 

LP . gen . delta_p, LP . gen . delta_p, 

LP . gen . delta_s 

LP . gen . delta_s , LP . gen . delta_p, 

LP . gen . delta_s 

LP . gen . delta_p, LP . gen . delta_p, 

LP . gen . delta_s 

LP . gen . mo ve j> , LP . gen . mo ve_s , 

LP .gen.move_p 



If the primary delta is less than the secondary delta, the 
deltas are swapped using three xoR instructions. The 
combined nrwvement parameter is corrected. 

At label L_03, the primary and secondary en'or Incre- 
ments are calculated. These error Increments are de- 
rived from those described in Bresenham's algorithm. 
The initial error term is calculated. The vector can now 
be drawn. 



_03: 
sll 
sll 
add 



LP.gen.slope,LP.gen.delta_s, 1 
LP.gen.x_slope,LP.gen.delta_p, 1 
LP . gen . error, LP . gen . error, 
LP. gen. slope 
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sub LP. gen. error, LP .gen. error, 

LP . gen . delta_j) 
calli ret , GP . pxl . op_vec 
sub LP .gen. count, LP .gen.de It a_p, 1 
jmpt LP .gen. count, L_0 6 
sub LP .gen. count, LP .gen. count, 1 

The first pixel is always drawn, even if it is the only 
pixel. This routine draws pixels by executing a calli 
instruction through GP.pxI.op.vec. This pointer must 
have been set up by the caller in the structure 
G29K_Params. The routine performs whatever opera- 
tion (XOR, Add, etc.) Is specified by the caller. 

Routine O1_02.S will be used as an example. It has ac- 
cess to the register names (at assemble time) because it 
includes G29K_REG.H. When it is called, LP.Ioc.addr 
contains the linear address of the pixel location, and 
GP.pxI.value contains the current pixel value (drawing 
color). The routine simply reads the address pixel into a 
temporary register and XORs it with current drawing 
color. The result is stored back into the current pixel lo- 
cation and the routine exits. 

In routine P_L2_01 .S, the code computes the numberof 
pixels left to right and determines whether there are 
more. If so, the count is decremented, and the genera- 
tion loop is entered. 

There are two paths through the generation loop. In the 
case where the move is in the primary direction only, the 
path through the loop is: 

L_04: 

jump to L_05 

add primary adjust to error 

L_05: 

call the operation routine 
add the prim move to addr 
dec loop count, jmp L_04 
(nop for delay slot) 

This (movement in the primary direction only) requires 
six instructions plus the operator routine. 

In the case where the move is in the combined direction, 
the path through the loop is: 

L_04: 

(do not) jump 

add primary adjust to error 
call the operator routine 
add combined adust to adr 
dec loop count, jmp L_04 
subtract out primary error 



L_ 


_04: 






jmpt 


LP. gen. error, L_05 




add 


LP .gen. error, LP .gen. error, 
LP. gen. slope 




calli 


ret , GP . pxl . op_vec 




add 


LP . loc . addr, LP . loc . addr, 
LP . gen . mo ve_s 




jmpf dec LP . gen . count , L_0 4 




sub 


LP . gen .error, LP . gen . error, 
LP. gen. X slope 




jmp 


L_0 6 




nop 




L_ 


_05: 






calli 


ret , GP . pxl . op_vec 




add 


LP . loc . addr, LP . loc . addr, 
LP . gen . mo ve_p 




jmpf dec LP . gen . count , L_0 4 




nop 




P 


L2 02.S 





This C-language callable routine is used to draw single- 
width clipped lines with any pixel operation. 

The operation (e.g., XOR, AND, ADD, etc.) is specified 
by the user by providing the address of the routine that 
performs the operation. The routine O1_02.S XORs the 
current contents of the addressed pixel with the pixel 
value. 

P_L2_02.S begins with the normal global functions. 
Fourteen parameters are loaded from the stmcture 
G29K_Params. 



GP . wnd . min_x 






lr2 


GP.wnd.max_x 






lr3 


GP.wnd.min y 






lr4 


GP.wnd.max_y 






IrS 


GP.pxI.value 






lr6 


GP .mem. width 






lr7 


GP .mem. depth 






lr8 


GP .wnd. base 






lr9 


GP. wnd. align 






IrlO 


GP.pxl.op_vec 






Irll 


GP.pxl.in_mask 






lrl2 


GP.pxl.do mask 






lrl3 


GP . pxl . do_value 






lrl4 


GP. pxl. out mask 






lrl5 


const Tempo, _G2 9K_ 


Params 


consth Tempo, _G29K_ 


Pa rams 


mtsrim cr, (14 - 


1) 






loadm 0, 0,GP.wnd.min_ 


_x. Tempo 



This movement in the combined direction requires six 
instructions plus the operator routine. 
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Routine S_l^1_01 is called to convert the Start.x, Start.y 
pair to a linear address. The linear address is returned in 
LP.Ioc.addr. 

add LP. loc.x, Start .X, 

call ret,S_Ml_01 

add LP .loc.y ^Start.y, 

This routine assunnes that the primary direction will be x 
and the secondary direction will be y. This is tested at 
label L_04. 

sub LP . gen . delta__p. Finish . x. Start . x 

sra LP .gen .error, LP .gen .delta^p, 31 

jmpf LP .gen.delta__p, L_01 

sub LP . gen. delta_s, Finish. y, Start .y 

subr LP . gen . delta^p, LP . gen . delta^p, 

constn LP .gen.move_p,-PIXEL_SIZE 

subr LP .gen. p, Start .X, 

subr LP .gen.min_p,GP . wnd.max_x, 

jmp L_02 

subr LP .gen.max_jp,GP . wnd.min_x, 

The routine computes the primary delta by subtracting 
Start.xUovn Finish.y. The sign bit is placed in the error 
variable to allow for reversibly retraceable lines. If Fin- 
ish.x is less than Start.x, the line is drawn from right to 
left. In this case, the primary delta is negated, and the 
primary movement is set to the negative of PIX- 
EL_SIZE, which is the distance between horizontally 
adjacent pixels, moving from right to left. The current pri- 
mary clipping location is set to the negative of Start.x, 
and the primary minimum and maximum clipping 
boundaries are set to the maximum and minimum x win- 
dow boundaries respectively. The secondary delta is 
computed by subtracting Start.y Uom Finish.y. 

If Finist).x is equal to or greater than Start.x, the code 
jumps to label L_01 . The primary movement value is set 
to PIXEL_SIZE, which is the distance between horizon- 
tally adjacent pixels, moving from left to right. The cur- 
rent primary clipping location is set to Start.x, and the 
primary minimum and maximum clipping boundaries 
are set to the minimum and maximum x window bounda- 
ries, respectively. The second delta is computed by sub- 
tracting Start.y irom Finish.y. 

L_01: 

const LP„gen.move_j),PIXEL_SIZE 

add LP.gen.p,Start .X, 

add LP.gen.min_p,GP.wnd.min_x, 

add LP .gen.max_j),GP .wnd.max_x, 

At label L_02, the secondary variables are computed 
similarly. If Finish.y \s less than Start.y, the line is drawn 
from lower addresses to higher addresses. The secon- 
dary delta is negated, and the secondary movement 
value is set to the distance between vertically adjacent 
pixels. The current secondary clipping location is set to 
the negative of Start.y, and the secondary minimum and 



maximum clipping boundaries are set to the negative 
of the maximum and minimum y window boundaries, 
respectively. 



L_02: 
jmpf 
const 
subr 
add 
subr 
subr 
jmp 
subr 



LP .gen.delta_s,L_03 

LP . clp . skip_vec, Loop 

LP . gen . delta_s , LP . gen . delta_s , 

LP . gen . move_s , GP . mem . width , 

LP .gen. s, Start .y, 

LP . gen . min_s , GP . wnd . max_y , 

L_04 

LP . gen . max_s , GP . wnd . min_y , 



If Finish.y is equal to or greater than Start.y, the code 
jumps to label L_03. The secondary movement value is 
set to the negative of the memory width , which Is the dis- 
tance between the vertically adjacent pixels, moving 
from bottom to top. The current secondary clipping loca- 
tion is set to Start.y, and the secondary minimum and 
maximum clipping boundaries are set to the minimum 
and maximum window boundaries, respectively. 



L_03: 
subr 
add 
add 
add 



LP . gen . mo ve_s , GP .mem. width, 
LP . gen . s , Start . y , 
LP . gen . min_s , GP . wnd . min_y , 
LP . gen . max_s , GP . wnd . max_y , 



At label L_04, the routine determines whether the x 
direction is actually going to be the primary. If not, the 
primary and secondary values are swapped. 

L_04: 

cpge TempO,LP .gen.delta_j5, 

LP.gen.delta_s 
jmpt TempO,L_05 
consth LP.clp.skip_vec, Loop 
xor LP.gen.delta_j), LP .gen.delta_p, 

LP .gen.delta_s 
xor LP.gen.delta_s,LP .gen.delta_p, 

LP . gen . delta__s 
xor LP. gen.de It a__p, LP. gen.de It a_p, 

LP . gen . delta_s 
xor LP.gen.move_p,LP .gen.move_p, 

LP . gen .move_s 
xor LP.gen.move_s, LP .gen.move_p, 

LP .gen.move_s 
xor LP .gen.move_p, LP .gen.move__j>, 

LP.gen.move_s 
xor LP. gen. p, LP .gen. p, LP .gen. s 
xor LP. gen. s, LP .gen. p, LP .gen. s 
xor LP. gen. p, LP. gen. p, LP. gen. s 
xor LP.gen.min_p,LP.gen.min_jp, 

LP.gen.min_s 
xor LP .gen.min_s, LP .gen.min_jp, 

LP.gen.min_s 



14 



Graphics Primitives 



xor LP.gen.min__p, LP.gen.min_p, 

LP . gen . min_s 
xor LP .gen.max_p, LP .gen.max_p, 

LP . gen . max_s 
xo r LP . gen . max_s , LP . gen . max_p , 

LP . gen . max_s 
xor LP .gen.max_p,LP .gen.max_jp, 

LP . gen . max_s 

There are a total of five sets of values to be swapped, 
requiring three xor instructions. 



LP. gen.de It a_j3 
LP . gen . mo ve_jp 
LP . gen . p 
LP . min_p 
LP . max_jp 



LP.gen.delta_s 
LP . gen . mo ve_s 
LP . gen . s 
LP . min_s 
LP. max s 



L 05: 




sll 


LP . gen 


sll 


LP . gen 


add 


LP . gen 




LP . gen 


sub 


LP . gen 




LP . gen 


const 


LP.clp 


consth 


LP.clp 


jmp 


L_08 


sub 


LP . gen 



That is, the primary and secondary values for the delta, 
the movement value, and the clipping parameters are 
swapped. 

At label L_05, the primary and secondary error Incre- 
ments are calculated. These en-or increments are de- 
rived from those described in Bresenham's algorithm. 
The initial error term is calculated. 



.slope, LP. gen. delta_Sr 1 

. x_slope, LP .gen.delta_p, 1 

.error, LP .gen. error, 

.slope 

.error, LP .gen. error, 

.delta_p 

. stop_vec, Stop 

. stop_vec, Stop 

. count , LP . gen . delta_j), 1 



The Clipping Stop Vector is set to the label Stop. The 
Clipping Skip Vector has already been set to the label 
Loop. The program jumps to the middle of the genera- 
tion loop, where the first pixel will be drawn if it is inside 
the clipping window. The generation loop count is calcu- 
lated by subtracting one from the primary delta. 

There are two possible paths through the generation 
loop, depending on whether the movement is in the pri- 
mary direction or in the combined direction. If the move- 
ment is in the primary direction only, the path through 
the loop is as follows: 

L_06: 

jump to L_07 

add X error to error 

L_07: 

add primary move to address 
inc primary clipping location 



L_08: 

assert loc inside window 
call pixel op routine 
(nop) 

Loop: 

deer count, jump L_06 
(nop) 

This movement in the primary direction only requires 1 2 
instructions plus any instructions In the pixel operation 
routine. This includes the four assert instructions. 

If the movement is in the combined direction, the path 
through the loop is as follows: 

L_06: 

(do not) jump to L_07 

add X error to error 

subtract .... 

add secondary move to address 

inc secondary clipping location 

L_07: 

add primary move to address 
inc primary clipping location 

L_08: 

assert loc inside window 
call pixel op routine 
(nop) 



Loop: 

deer count, jump L_ 
(nop) 



06 



This movement In the combined direction requires 15 
Instructions plus any instructions in the pixel operation 
routine. This includes the four assert instructions. 
Clipping is mechanized as described in section 
"P L2 01. S" above. 



L 


06: 






jmpt 


LP .gen. error, L 07 




add 


LP . gen . error, LP . gen . error, 
LP .gen. slope 




sub 


LP . gen .error, LP . gen . error, 
LP. gen. x_s lope 




add 


LP . loc . addr, LP . loc . addr, 
LP.gen.move_s 




add 


LP . gen . s , LP . gen . s , 1 


L 


07: 






add 


LP . loc . addr, LP . loc . addr, 
LP . gen .move_j> 




add 


LP .gen.p, LP .gen.p, 1 


L 


08: 
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asge V_CLIP_SKIP, LP.gen.p, 

LP.gen.min_p 
asge V_CLIP_SKIP,LP.gen.s, 

LP . gen . min_s 
asle V_CLIP_STOP,LP.gen.p, 

LP .gen.max__p 
asle V_CLIP_STOP,LP.gen.s, 

LP.gen.max_s 
calli ret,GP .pxl.op_vec 
nop 

Loop: 

jmpf dec LP . gen . count , L_0 6 
nop 

P_L3_01.S 

The routine P_L3__01.S is used to draw anti-aliased 
wide lines with user-supplied writing and anti-aliasing 
functions. The routine can also be used to draw wide 
lines without anti-aliasing. Clipping is not performed. 

The actual bit map operations are done by routines sup- 
plied by the calling function. One of the parameters sup- 
plied by the calling routine to P__L3_01 .S is the address 
of a user-written routine. The input parameters to the 
routine are the address of the pixel location and the re- 
quired "coverage" of that pixel. 

The coverage parameter is an integer in the range 1 
through 256 inclusively, Indicating the portion of the 
pixel (in 256ths) that is actually covered by the line. 
Alternatively, It can be thought of as a real number in 
the range 1/256 to 1 , with the radix point between bits 7 
and 8. 

The routine begins with the normal global functions. 
Eleven parameters are loaded from the structure 
G29K Params. 



Routine S_IVI1_01 is called to convertthe Start.x, Start.y 
pair to a linear address. The linear address is returned in 
LP.Ioc.addr. 



GP.pxl. value 




lr6 




GP .mem. width 




lr7 




GP .mem. depth 




lr8 




GP.wnd.base 




lr9 




GP .wnd. align 




IrlO 




GP.pxl.op_vec 




Irll 




GP.pxl. in_mask 




lrl2 




GP.pxl. do_mask 




lrl3 




GP . pxl . do va lue 




lrl4 




GP.pxl. out_mask 




Iris 




GP.wid. actual 




lrl6 




const Tempo, _G2 9K_ 


Params + 4 * 


4 


consth Tempo, _G2 9K_ 


Params + 4 * 


4 


mtsrim cr, (11 - 


1) 






loadm 0, 0, GP.pxl. value, Tempo 





add 


LP. loc.x, Start. x,0 




call 
add 


ret,S_Ml_01 

LP . loc . y, Start . y , 




sub 


LP . gen . delta_j3. Finish . 
Start.x 


.X, 


jmpf 


LP . gen . delta_p, L_01 




sub 


LP . gen . delta_s , Finish , 
Start.y 


'Yr 


subr 


LP . gen . delta_p, LP . gen , 


.delta_p,0 


jmp 


L 02 





constn LP.gen.move_j),-PIXEL_SIZE 

This routine assumes that the primary direction will be x 
and the secondary direction will be y. If this is not the 
case, the parameters will be swapped at L_04. The pri- 
mary delta is calculated by subtracting Sfa/t.xfrom F'tn- 
ish.x. If Finish.x is greater than or equal to Start.x, the 
code at L_01 sets the primary movement value to 
PIXEL_SIZE, which is the distance In bytes between 
horizontally adjacent pixels. The secondary delta is cal- 
culated by subtracting Start.yirom Finish.y. 



L_01: 
const 



LP .gen.move_p,PIXEL_SIZE 



If Finish.x is less than Start.x, the primary delta is ne- 
gated and the primary movement value is set to the 
negative of PIXEL_SIZE. The secondary delta is calcu- 
lated by subtracting Start.yirom Finish.y. 

At label L_02, the secondary parameters are calculated 
in a similar manner. If Finish.y is less than Start.y, the 
secondary delta Is negated and the secondary move- 
ment value is set to GP.mem.width. If Finish.y \s greater 
than or equal to Start.y, the code jumps to L_03, where 
the secondary movement value is set to the negative of 
GP.mem.width. In either case, the beginning address is 
set to the beginning pixel address calculated from 
Start.x, Start.y. 



L_02: 
jmpf 
add 
subr 
jmp 
add 

L_03: 
subr 



LP . gen . delta_s , L_03 

LP . gen . addr , LP . loc . addr, 

LP . gen . delta_s , LP . gen . delta_s , 

L_04 

LP . gen . mo ve_s , GP .mem. width, 

LP . gen . mo ve_s , GP . mem . width , 
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At label L_04, the code determines which axis will be pri- 
mary. If the primary delta is greater than or equal to the 
secondary delta, the initial assumption was correct and 
nothing extra need be done. If the primary delta is less 
than the secondary delta, the primary direction will be y, 
and two pairs of parameters must be swapped. 



LP . gen . delta_p 
LP.gen.move_p 



LP . gen . delta_s 
LP . gen . mo ve_s 



L_04: 
cpge 

jmpt 

const 

xor 



Tempo , LP . gen . delta_p, 

LP . gen . delta_s 

TempO,L_05 

LP. gen. error, 

LP .gen.delta_jp, LP .gen.delta_p, 

LP . gen . delta_s 

LP .gen.delta_s,LP .gen.delta_p, 

LP . gen . delta_s 

LP .gen.delta_j>, LP .gen.delta_p, 

LP . gen . delta_s 

LP . gen . mo ve_p , LP . gen . mo ve__p , 

LP . gen . mo ve_s 

LP . gen . move_s , LP . gen . mo ve_p , 

LP . gen . move_s 

LP . gen . mo ve_p , LP . gen . mo ve_p , 

LP . gen . move_s 



In either case, the error term Is forced to zero. At label 
L_05, the axial half-width Is calculated. Figure 6 illus- 
trates this derivation. 

The line (or a representative piece of the line) is illus- 
trated by the heavy line from S to F. The wide line to be 
drawn covers the area between the two lines parallel to 
the heavy line. The specific line segment in the figure 



has a slope of As/Ap = 2/4 = 0.5. The distance to be de- 
termined is the distance from the center of the line to the 
exact edge of the line in the secondary direction. This is 
distance h in Figure 6. 



L_05: 
mtsr 
mulu 
mulu 



q, LP . gen . delta__p 
Tempo, LP .gen.delta_p, 
Tempo, LP . gen.de It a_p. Temp 



mulu Tempo, LP .gen. delta_p, Tempo 
mfsr Tempi, q 
mtsrim fc,16 
extract Tempo , Tempo , Tempi 
mtsr q, LP .gen.delta_s 
mulu LP .wid. axial, LP .gen.delta_s, 
mulu LP .wid. axial, LP .gen.de It a_s, 
LP. wid. axial 



mulu LP .wid. axial, LP .gen.delta_s, 

LP. wid. axial 
mfsr Tempi, q 
mtsrim fc,16 

extract LP . wid . axial , LP . wid . axial , Tempi 
add LP .wid.axial, Tempo, LP .wid. axial 
srl Temp2, LP.wid.axial, 16 
sll LP .wid.axial, LP .wid.axial, 16 
mtsr q, LP.wid.axial 
divO Tempi, Temp2 
div Tempi, Tempi, Tempo 




As = 2 



Ap = 4 

Figure 6. Axial Half-Width 
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div Tempi , Tempi , TempO 

divl Tempi, Tempi, Tempo 

mfsr LP. wid. axial, q 

clz TempO, LP .wid. axial 

subr Tempo, Tempo, 32 

srl Tempo, Tempo, 1 

srl Tempi, LP .wid. axial. Tempo 

add Tempo , Tempi , 

There are two triangles, SBF and the smaller sbf, each 
identified by its three vertices. Angle B and angle b are 
right angles. Sides FB and fs are vertical and are there- 
fore parallel. Angle F and angle f are equal because they 
are formed by parallel lines intersected by a common 
line. The triangles are similar because they have two 
(and therefore three) congruent angles. Because the tri- 
angles are similar, corresponding sides must be in pro- 
portion. That is, the hypotenuses are proportional to the 
sides sb and SB. 



a/2 
Ap 



2 2 

Ap + As 



Rearranging, and moving the bottom Ap inside the radi- 
cal for convenience, yields the equation: 



V^ 



2+ As 2 



Ap2 



The equation is evaluated in the following manner. The 
square of delta„p is calculated, and the result is left in 
Tempo. The square of delta_s Is calculated, and the re- 
sult is left in LP.wid.axial. The sum of the squares is cal- 
culated, shifted left 16 bit positions, and divided by 
delta__p squared. The result will be a real number be- 
tween 1 and 2, with the radix point between bit positions 
15 and 16. 

The square root of the quotient is calculated iteratlvely at 
L_07. The square root Is multiplied by the actual line 
width and the result is divided by two to get the half- 
width. This is left in LP.wid.axial. 



L_07: 
mtsr 
divO 
div 



q, LP . wid . axial 

Temp2 , 

Temp2 , Temp2 , Tempi 



div Temp2 , Temp2 , Tempi 
divl Temp2 , Temp2 , Tempi 
mfsr Temp2,q 



add 


Temp2 , Temp2 , Tempi 


srl 


Temp2 , Temp2 , 1 


cpeq 


Tempo , Temp2 , TempO 


jmpt 


TempO,L_08 


add 


Tempo, Tempi, 


cpeq 


Tempi , Temp2 , Tempi 


jmpf 


Templ,L_07 


add 


Tempi, Temp2, 



The slope of the line is calculated by dividing the secon- 
dary delta times 256 by the primary delta. The quotient is 
left in LP.gen.slope, and the remainder Is left In 
LP.gen.x_slope. The primary en-or term Is set to the 
negative of the primary delta. 

The loop count is calculated by subtracting one from the 
primary delta, and the generation loop Is entered at the 
middle. The first point is always drawn and Is centered In 
the line. 

L_08: 

mtsr q,GP. wid. actual 

mulu TempO , Temp2 , 

mulu TempO , Temp2 , TempO 

mulu Tempo , Temp2 , TempO 

mulu Tempo , Temp2 , TempO 

mulu Tempo , Temp2 , TempO 

mulu Tempo , Temp2 , TempO 

mulu Tempo , Temp2 , TempO 

mulu Tempo , Temp2 , Tempo 

mfsr Tempi, q 

mtsrim fc,7 

extract LP . wid . axial, TempO , Tempi 

sll TempO,LP.gen.delta_s, 8 

mtsr q, Tempo 

divO Tempi, 

div Temp 1, Tempi, LP. gen. del ta_p 



div Temp 1, Tempi, LP. gen. delta_p 

divl Temp 1, Tempi, LP. gen. delta_p 

divrem LP .gen. x_s lope, Tempi, 

LP.gen.delta_jp 

mfsr LP .gen .slope, q 

subr LP . gen . x_error , LP . gen . delta_p, 

jmp L_ll 

sub LP.gen.count,LP.gen.delta_p, 1 

The generation loop begins at L_09. The slope is added 
to the error term. If the LP.gen.x.error is negative, the 
primary delta is subtracted from it, and the error term Is 
incremented by 1 . 

L_09: 

jmpt LP .gen.x_error, L_10 

add LP .gen. error, LP. gen. error. 
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LP .gen. slope 
sub LP .gen.x_error,LP .gen.x_error, 

LP . gen . delta_p 
add LP. gen. error, LP .gen. error, 1 

At label L_10, the position of the next optimum pixel is 
calculated. This calculation is carried out with a resolu- 
tion of 1/256 of a real pixel. The primary movement Is 
added into the address. If a combined movement is re- 
quired, the en^or is reduced by 256 (one full pixel), and 
the secondary movement is added in. 



L_10: 
add 

cpge 

jmpf 

nop 

sub 

sub 

add 



LP . gen . addr , LP . gen . addr , 
LP . gen . mo ve_p 
Temp2, LP. gen. error, 128 
Temp2 , L_l 1 

LP .gen. error, LP .gen. error, 128 
LP .gen. error, LP .gen. error, 128 
LP . gen . addr, LP . gen . addr, 
LP . gen . mo ve_s 



illustrates this and shows why it is important to calculate 
the axial half-width so precisely. When the generation 
loop is calculating pixel addresses, it always chooses 
exact pixels in the primary direction. That is, each posi- 
tion it chooses In the primary direction maps to an actual 
pixel location in the primary direction. It almost never 
maps to an actual pixel location in the secondary direc- 
tion, except at the end points. By calculating the width of 
the line in the secondary direction very precisely, the 
routine Is able to determine with equal precision the por- 
tion of the "outside" pixels that are covered. 



L_ll: 
add 
add 
add 
sub 

jmpf 
subr 
add 



LP .wid.side_l, LP .wid. axial, 

LP .wid.side_2, LP .wid. axial, 

LP .gen. cover, LP .gen. error, 128 

LP . wid . s ide_l , LP . wid . s ide_l , 

LP .gen. cover 

LP . wid . s ide_l , L_l 2 

Temp2, LP .gen. error, 12 8 

LP . gen . cover, LP . gen . cover, 

LP. wid. side 1 



At label L_1 1 , the distances the line will extend to either 
side of the optimum pixel are computed. Figure 7 




11011A-07 



Figure 7. Pixel Coverage 
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In Figure 7, each actual pixel is shown as a circle. On a 
real monitor, the area illuminated by each pixel may be 
larger or smaller, depending on the intensity and focus. 

The generation loop has chosen the shaded area as the 
point in the secondary direction that corresponds to the 
column chosen in the primary direction. (The primary di- 
rection Is horizontal.) The column of pixels to the right of 
the column of interest has been deleted to make room 
for the coverage values. The actual width has been set 
to 3. The axial half-width with the indicated slope is 
something more than 1 .5 but less than 2.0. The two pix- 
els closest to the ideal point will be completely covered 
by the line. The next pixel down will be well over 50 per- 
cent covered by the line, perhaps 85 percent. The pixel 
at the top will be only somewhat covered, perhaps 15 
percent. 

L_12: 

add LP .gen. cover, LP .gen. cover, Temp2 
sub LP .wid.side_2,LP .wid.side_2, 

Temp2 
jmpf LP .wid.side_2,L_13 
add LP .loc.addr, LP.gen.addr, 
add LP .gen. cover, LP .gen. cover, 
LP.wid.side_2 

At label L_13, the optimum pixel is drawn by calling the 
user-supplied routine with the address and coverage of 
the optimum pixel. This pixel may be fully covered or 
may be only partially covered. There is nothing special 
about the optimum pixel except that it will always be 
drawn, regardless of the line width, and will be drawn 
outside the loops for each of the two sides. The routine 
determines whether any pixels or partial pixels are re- 
quired on side 1 . If not, it jumps to L_18 to test for pixels 
on side 2. 

L_13: 

calli ret,GP.pxl.op_vec 

cpgt Temps, LP. wid. s ide__l, 

jmpf Temp3 , L_l 8 

const LP. gen. cover, 25 6 

If pixels are required on side 1 , they are drawn in the 
loop beginning at L_14. For a line in the first octant, 
these pixels will be below the optimum pixel. The ad- 
dress is decremented by the secondary nx)vement 
value to find the next pixel In the secondary direction. 
The width remaining to side 1 is decremented by the 
value in LP.gen.cover. This parameter was Initialized to 
256 (for full coverage). If reducing the width causes it to 
be less than zero, it is added into LP.gen.cover(\\\\s ac- 
tually reduces LP.gen.cover). 

L_14: 

sub LP .loc.addr,LP.loc.addr, 

LP . gen . mo ve_s 
sub LP.wid.side_l,LP.wid.side_l, 

LP.gen.cover 
jmpf LP .wid.side__l,L_15 



nop 
add 



LP . gen . cover, LP . gen . cover, 
LP.wid.side 1 



At label L_1 5, the user-supplied routine Is called to draw 
this pixel. The remaining width of side 1 is compared to 
zero. If it is greater than zero, the routine loops back to 
L_1 4to draw the next pixel. Inthe example shown In Fig- 
ure 7, the first pixel on side 1 is 85 percent covered and 
the routine does not loop back. The coverage is set back 
to 256 for full coverage. 

L_15: 

calli ret , GP . pxl . op_vec 

cpgt Temp3,LP.wid.side_l, 

jmpt Temp3 , L_l 4 

const LP. gen. cover, 25 6 

At label L_18, the routine tests the remaining width of 
side 2 and restores the pixel address back to the opti- 
mum pixel. In the example in Figure 7, two pixels will be 
drawn on side 2 (above the optimum pixel). 

L_18: 

cpgt Temp3,LP.wid.side_2, 

jmpf Temp3 , L_2 1 

add LP . loc .addr, LP .gen.addr, 

In the loop starting at label L_19, the pixels above the 
optimum pixel are drawn. The pixel address is adjusted 
by adding the secondary movement, and the width re- 
maining to side 2 is reduced by the amount the pixel is to 
be covered. This was previously set to 256 for complete 
coverage. If the remaining width is reduced to less than 
zero, the width is added back into the coverage (actually 
reducing it to below 256 for the outside pixel). In the ex- 
ample in Figure 7, the width will not be reduced to below 
zero. 

L_19: 

add LP .loc. addr, LP . loc. addr, 

LP . gen . mo ve_s 
sub LP .wid.side_2, LP .wid.side_2, 

LP .gen. cover 
jmpf LP . wid . s ide_2 , L_2 
nop 
add LP. gen. cover, LP .gen. cover, 

LP.wid.side_2 

At label L_20, the user-supplied routine is called with the 
address and coverage to actually draw the pixel. If the 
remaining width is greater than zero, the routine loops 
back to L_1 9 to draw the next pixel . The coverage Is set 
to 256. In the example, the routine will loop once after 
drawing the pixel just above the optimum pixel. Every 
pixel except the outside one will have a coverage value 
of 256. That is, only the outside pixels are not completely 
covered. 
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L_20: 

calli ret,GP .pxl.op_vec 

cpgt Temp3, LP.wid.side_2, 

j mpt Temp 3 , L_l 9 

const LP .gen. cover, 256 

At label L_21 , the routine decrements and tests the loop 
count and adds the slope into the primary error. If more 
pixels are required in the primary direction— that is, if the 
line is not complete — ^the routine continues at label 
L_09. 

L_21: 

jmpf dec LP . gen . count , L_0 9 
add LP .gen.x_error, LP .gen.x_error, 
LP . gen . x_slope 

An example of a user-supplied drawing routine is in the 
file O2_02.S, which is included on the distribution disk- 
ette. This is an alrTX)st trivial example that scales the in- 
put value to the top of the word and stores it. That is, the 
illumination of a pixel is linearly related to its coverage. 
Lines drawn with such a function will exhibit the "barber- 
pole" effect. 

The pixel operation routine can also use the coverage 
value as an independent variable in a more complicated 
anti-aliasing function. It can also correct for specific 
hardware differences, such as the actual shape or size 
of the pixel, or be used to scale color model component 
Intensities for color anti-aliasing. Of course, these more 
complex operations require more computation, with a 
corresponding reduction in overall drawing speed. 

Wide lines can be drawn without anti-aliasing by supply- 
ing a routine that ignores the coverage input and just 
draws with the current pixel color. Logical or arithmetic 
operations could be used as well. 

P_L3_02.S 

The routine P_L3_02.S is used to draw anti-aliased 
wide lines with user-supplied writing and anti-aliasing 
functions. The routine can also be used to draw wide 
lines without anti-aliasing. Clipping is also performed. 

The actual bit map operations are done by routines sup- 
plied by the calling function. One of the parameters sup- 
plied by the calling routine to P_L3_02.S Is the address 
of a user-written routine. The input parameters are the 
address of the pixel location and the required "cover- 
age" of that pixel. 

The form of the coverage parameter is an integer in the 
range 1 through 256 inclusively, indicating the portion of 
the pixel (In 256ths) that is actually covered by the line. 
Alternatively, it can be thought of as a real number in the 
range 1/256 to 1 , with the radix point between bit 7 and 
bit 8. 

The routine begins with the normal global functions. 



Fifteen parameters are loaded from the structure 
G29K_Params. 



GP.wnd.min_x 






lr2 


GP . wnd.max_x 






IrS 


GP . wnd.min_y 






lr4 


GP .wnd.max_y 






IrS 


GP.pxl. value 






lr6 


GP .mem. width 






lr7 


GP. mem. depth 






lr8 


GP .wnd.base 






lr9 


GP.wnd. align 






IrlO 


GP . pxl . op_vec 






Irll 


GP.pxl. in_mask 






lrl2 


GP.pxl.do_mask 






lrl3 


GP.pxl.do_value 






lrl4 


GP . pxl . out_maski 






lrl5 


GP.wid. actual 






lrl6 


const Tempo, _G2 9K_ 


_Params 


consth Tempo, _G2 9K_ 


Pa rams 


mtsrim cr, (15 - 


1) 






loadm 0, 0,GP .wnd.min 


_x, Temf 



Routine S__IVI1_01 is called to convert the Start.x, Start.y 
pair to a linear address. The linear address is returned in 
LP.Ioc.addr. 



add 


LP.loc.x, Start .X, 


call 


ret , S_M1_0 1 


add 


LP. loc.y, Start. y,0 


sub 


LP . gen . delta_p, Finish . x. 




Start. X 


jmpf 


LP . gen . delta_j5, L_0 1 


sub 


LP . gen . delta_s , Finish . y , 




Start.y 


subr 


LP.gen.delta_p,LP.gen.delta_j>, 


constn 


LP .gen .move_p, -PIXEL_SIZE 


subr 


LP .gen. p, Start .x, 


subr 


LP .gen.min_j),GP . wnd.max x, 


jmp 


L_02 


subr 


LP .gen.max_j>,GP .wnd.min x, 



The routine assumes that the primary direction will be x 
and that the secondary direction will be y. The primary 
delta Is calculated by subtracting Sfarf.xfrom Finish.x. If 
the result Is greater than or equal to zero, the code at 
L_01 sets the primary movement value to the distance 
between pixels (moving from left to right), sets the 
primary clipping point to Start.x, and sets the primary 
minimum and maximum clipping boundaries to the mini- 
mum and maximum window values, respectively. The 
secondary delta is calculated by subtracting Sfa/t.yfrom 
Finish.y. 
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L_01: 
const 
add 
add 
add 



LP . gen . move_p, P IXEL_S I ZE 
LP. gen. p, Start .x, 
LP .gen.min_p,GP . wnd.min_x, 
LP . gen . max_j>, GP . wnd . max_x, 



If Finish.x\s less than Start.x, the line will be drawn from 
right to left. The primary delta is negated, and the pri- 
mary movement value is set to the distance between 
pixels (from right to left). The primary clipping point is set 
to the negative of Start.x, and the minimum and maxi- 
mum clipping boundaries are set to the maximum and 
minimum x window values, respectively. The secondary 
delta is calculated by subtracting Sfarf.y from Finish.y. 

At label L_02, the secondary values are calculated from 
the y-axis inputs. If Finish.y is greater than or equal to 
Start.y, the routine continues at label L_03. It sets the 
secondary movement value to the negative of the mem- 
ory width (moving from bottom to top) and sets the 
current secondary clipping position to Start.y. The sec- 
ondary minimum and maximum clipping t)Oundaries are 
set to the minimum and maximum y window values, re- 
spectively. The general address is initialized to the be- 
ginning address. 



_02: 
jmpf 
add 
subr 
add 
subr 
subr 
jmp 
subr 

_03: 
subr 
add 
add 
add 



LP . gen . de It a_s , L_0 3 

LP . gen . addr , LP . loc . addr , 

LP . gen . delta_s , LP . gen . delta_s , 

LP . gen . move_s , GP . mem . width , 

LP . gen . s , Start . y , 

LP . gen . min_s , GP . wnd . max_y , 

L_04 

LP .gen.max_s,GP .wnd.min_y, 

LP . gen . mo ve_s , GP . mem . width , 
LP. gen. s, Start .y, 
LP . gen . min_s , GP . wnd . min_y , 
LP . gen . max_s , GP . wnd . max_y , 



If Finish.y \s less than Start.y, the line will be drawn from 
top to bottom. The secondary delta is negated, and the 
secondary movement value is set to the memory width. 
The current secondary clipping position is set to the 
negative of Start.y. The secondary minimum and maxi- 
mum clipping lx)undaries are set to the maximum and 
minimum y window values, respectively. The general 
address is initialized to the beginning address. 

At label L_04, the routine determines whether x should 
be the primary direction. If so, the Initial assumption is 
correct and nothing special need be done. LP.gen.error 
is set to zero. 

If the primary direction is to be y, the primary and secon- 
dary parameters must be swapped. In particular, the fol- 
lowing five pairs of variables are swapped. 



LP . gen . delta j 

LP.gen.move_p 

LP . gen . p 

LP . gen . min_p 

LP . gen . max_p 



LP . gen . delta_s 
LP . gen . mo ve_s 
LP . gen . s 
LP .gen.min_s 
LP . gen . max_s 



That Is, the 
the clipping 

L_04: 
cpge 

jmpt 

const 

xor 



deltas and movement values, as well as all 
values, are swapped. 

TempO, LP .gen.delta_p, 

LP . gen . de It a_s 

TempO,L_05 

LP .gen. error, 

LP .gen.delta_p,LP .gen.delta_p, 

LP .gen.delta_s 
xor LP .gen.delta_s,LP .gen.delta_p, 

LP . gen . delta_s 
xor LP .gen.delta__p,LP .gen.delta_p, 

LP .gen.delta_s 
xor LP .gen.move_j>, LP .gen.movej>, 

LP .gen.move_s 
xor LP . gen . mo ve_s , LP . gen . move_p , 

LP . gen . mo ve_s 
xo r LP . gen . mo ve_p , LP . gen . mo ve_p , 

LP . gen . mo ve_s 
xor LP .gen. p, LP .gen. p, LP .gen. s 
xor LP .gen. s, LP .gen. p, LP .gen. s 
xor LP .gen. p, LP .gen. p, LP .gen. s 
xor LP .gen.min_jp, LP .gen.min_p, 

LP .gen.min_s 
xor LP .gen.min_s, LP .gen.min_j), 

LP .gen.min_s 
xor LP .gen.min_j3, LP .gen.min_j3, 

LP .gen.min_s 
xor LP .gen.max__p,LP .gen.max_jD, 

LP .gen.max_s 
xor LP .gen.max_s,LP .gen.max_p, 

LP .gen.max_s 
xor LP .gen.max_p, LP .gen.max_j3, 

LP . gen . max_s 

At label L_05, the axial half-width is calculated. This is 
done in the same way as in routine P_L3_01.S. 



L_05: 
mtsr 
mulu 
mulu 
mulu 
mf sr 



q, LP . gen . delta_p 
Tempo , LP . gen . delta_p, 
Tempo, LP . gen. delta_p, Temp 
TempO , LP . gen . delta_jp, TempO 
Tempi, q 

mtsrim fc,16 

extract TempO , TempO , Tempi 

mtsr q, LP .gen.de It a_s 
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mulu LP .wid. axial, LP .gen.delta_s, 
mulu LP .wid. axial, LP .gen.delta_s, 

LP. wid. axial 
mulu LP . wid . axial, LP . gen . delta_s , 

LP. wid. axial 



mfsr Tempi, q 

mtsrim fc,16 

extract LP . wid . axial, LP . wid . axial. 

Tempi 
add LP .wid. axial, Tempo, 

LP .wid. axial 
srl Temp2,LP.wid.axial, 16 
sll LP. wid. axial, LP. wid. axial, 16 
mtsr q, LP .wid. axial 
divO Tempi, Temp2 
div Tempi , Tempi , Tempo 



di V Temp 1 , Temp 1 , Temp 

divl Tempi , Tempi , Tempo 

mfsr LP .wid. axial, q 

clz Tempo, LP .wid. axial 

subr Tempo , Tempo , 32 

srl TempO, Tempo, 1 

srl Tempi, LP .wid. axial, Tempo 

add Tempo , Tempi , 

At label L_07, the square root is calculated iteratively. 

L_07: 

mtsr q, LP .wid. axial 

divO Temp2 , 

div Temp2 , Temp2 , Tempi 



div 


Temp2 , Temp2 , Tempi 


divl 


Temp2 , Temp2 , Tempi 


mfsr 


Temp2 , q 


add 


Temp2 , Temp2 , Tempi 


srl 


Temp2 , Temp2 , 1 


cpeq 


Tempo , Temp2 , Tempo 


jmpt 


TempO,L_08 


add 


Tempo, Tempi, 


cpeq 


Tempi , Temp2 , Tempi 


jmpf 


Templ,L_07 


add 


Tempi, Temp2, 



At label L_08, the slope is calculated, and then the clip- 
ping destinations are initialized. There are a total of six 
destinations: two in the primary direction and four in the 
secondary direction. The destinations are loaded into 
the vector pointers depending on whether the current 
clipping tests are primary, secondary side 1 , or secon- 
dary side 2. The actions performed for each case are 
summarized in Table 6. 

L_08: 

mtsr q,GP .wid. actual 

mulu Tempo , Temp2 , 

mulu Tempo , Temp2 , TempO 

mulu Tempo , Temp2 , TempO 

mulu TempO , Temp2 , TempO 

mulu Tempo , Temp2 , TempO 

mulu Tempo , Temp2 , TempO 

mulu Tempo , Temp2 , TempO 

mulu Tempo , Temp2 , Tempo 

mfsr Tempi, q 

mtsrim fc,7 

extract LP .wid. axial, TempO, Tempi 

sll TempO,LP.gen.delta_s, 8 

mtsr q, Tempo 

divO Tempi, 

div Tempi, Tempi, LP .gen. del ta__p 



div Tempi, Tempi, LP .gen. del ta_p 

divl Temp 1, Tempi, LP .gen. del ta_p 

divrem LP .gen. x_s lope, Tempi, 

LP . gen . delta_p 

mfsr LP .gen. slope, q 

subr LP.gen.x_error, LP .gen.delta_jp, 

const LP .clp. skipjp, Skip_p 

consth LP .clp.skip__p, Skip__p 

const LP .clp.stop_p, Stop_p 

consth LP.clp.stop__p, Stop_p 

const LP .clp.skip_s, Skip_s 

consth LP.clp.skip_s, Skip_s 

const LP . clp . skip_s_l , Skip_s_l 

consth LP . clp . skip_s_l , Skip_s_l 

const LP .clp.stop_s_l, Stop_s_l 

consth LP .clp.stop_s_l, Stop_s_l 

const LP . clp . skip_s_2 , Skip_s_2 

consth LP . clp . skip_s__2 , Skip_s__2 

const LP . clp . stop_s_2 , Stop_s_2 

consth LP . clp . stop_s_2 , Stop_s_2 

jmp Ij_11 

sub LP . gen . count , LP . gen . de 1 1 a jp , 1 
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Table 6. Clipping Tests 


Destination 


Vector 


Near Label Action If Outside Window 



Skip_p 


V_CLIP_SKIP 


L_11 


Decrement primary count 


Stop_p 


V CLIP STOP 


L 11 


Exit routine 


Skip_s_1 


V CLIP SKIP 


L 15 


Test side 1 width 


Stop_s_1 


V CLIP STOP 


L 15 


Exit side 1 loop 


Skip_s_2 


V CLIP SKIP 


L 20 


Test side 2 width 


Stop_s_2 


V CLIP STOP 


L 20 


Exit side 2 loop 



The clipping method is an extension of the method de- 
scribed for single-width lines in section "P_L1_02.S." 
The algorithm generates pixel locations and asserts that 
the locations are inside the clipping window. If they are 
inside the clipping window, they are drawn. If they are 
outside the clipping window, the algorithm goes on to 
the next pixel (if the current pixel is "before" the window), 
or exits (if the pixel is "after" the window). In the two- 
dimensional extension implemented in P_L3_02.S, this 
is done In three phases. The algorithm finds each opti- 
mum pixel in the primary direction and does the primary 
clipping. Then It finds each pixel on side 1 of the opti- 
mum pixel (in the secondary direction) and does the first 
secondary clipping. Finally, it finds each pixel on side 2 
of the optimum pixel (in the secondary direction) and 
does the other secondary clipping. This two-dimen- 
sional clipping will result in abrupt ends of lines. Further, 
the shape of the line end will be different if the line 
is clipped at a corner of the window. This is shown in 
Figure 8. 



The generation loop begins at label L_09. The slope is 
added into the error term. If the primary error is not 
negative, it is reduced by the primary delta, and the en-or 
term is incremented by 1 . 



_09: 
jmpt 
add 

sub 

add 



LP .gen.x_error, L_10 

LP . gen . error, LP . gen . error, 

LP .gen. slope 

LP .gen.x_error, LP .gen.x_error, 

LP . gen . delta_p 

LP .gen. error, LP .gen. error, 1 



At label L_1 0, the primary movement value is added into 
the address. The primary clipping position is incre- 
mented. If the error term is greater than or equal to 1 28 
(0.5), it is decremented by 1 , the secondary movement 
value is added into the address, and the secondary clip- 
ping position is incremented. 




Figure 8. Wide-Line Clipping 
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L_10: 

add LP . gen . addr , LP . gen . addr , 

LP . gen . mo ve_p 
cpge Temp2, LP .gen. error, 128 
jmpf Temp2 , L_l 1 
add LP .gen. p, LP .gen. p, 1 
sub LP. gen. error, LP. gen. error, 128 
sub LP. gen. error, LP. gen. error, 128 
add LP . gen . addr , LP . gen . addr , 

LP . gen . mo ve_s 
add LP . gen . s , LP . gen . s , 1 

At label L_1 1 , the optimum pixel location is asserted to 
be within the primary clipping boundaries. If it is less 
than the minimum primary clipping boundary, control is 
transferred to Skip_p because the clipping window has 
not yet been reached. If it is greater than the maximum 
primary clipping boundary, control is transferred to 
Stop_p because the clipping window has just been ex- 
ited in the primary axis. If the optimum pixel location is 
inside the primary clipping window, the remaining 
widths for sides 1 and 2 are set up. 

L_ll: 

add LP .clp.skip_vec,LP.clp.skip_p, 

asge V_CLIP_SKIP,LP .gen.p, 

LP . gen . min_p 
add LP .clp.stop_vec,LP .clp.stop_p, 
asle V_CLIP_STOP, LP. gen.p, 

LP . gen . max_p 
add LP.wid.side_l,LP .wid.axial, 
add LP .wid.side_2, LP. wid.axial, 
add LP .gen.try_s,LP .gen.s, 
add LP .gen. cover, LP. gen. error, 128 
sub LP.wid.side_l,LP.wid.side_l, 

LP. gen. cover 
jmpf LP.wid.side_l,L_12 
subr Temp2, LP. gen. error, 128 
add LP. gen. cover, LP .gen. cover, 

LP.wid.side_l 

L_12: 

add LP .gen. cover, LP .gen. cover, 

Temp2 
sub LP.wid.side_2,LP.wid.side_2, 

Temp2 
jmpf LP . wid . s ide_2 , L_l 3 
add LP. loc. addr, LP .gen. addr, 
add LP. gen. cover, LP .gen. cover, 

LP.wid.side_2 

At label L_13, the user-supplied routine is called to draw 
the optimum pixel. The clipping vectors are set up for 
side 1 . 



asle V_CLIP_SKIP,LP.gen.try_s, 

LP .gen.max_s 
asge V_CLIP_SKIP,LP .gen.try_s, 

LP . gen . min_s 
calli ret,GP.pxl.op_vec 

Skip_s : 

cpgt Temp3 , LP . wid . s ide_l , 
jmpf TempS , L_l 8 
const LP .gen. cover, 256 
add LP .clp.skip_vec, 

LP.clp.skip_s_l, 
add LP .clp.stop_vec, 

LP . c Ip . s t op_s_l , 

Label L_1 4 is the top of the loop that draws pixels on one 
side (side 1 ) of the optimum pixel. The algorithm moves 
away from the optimum pixel in the secondary direction 
until the remaining width becomes zero or negative. 
Each pixel is asserted to be within the secondary clip- 
ping boundaries at label L_15. If it Is greater than the 
maximum secondary clipping boundary, it is skipped. If 
it is less than the minimum secondary clipping bound- 
ary, the loop exits. If it is within the secondary clipping 
boundaries, the drawing routine is called with the ad- 
dress and coverage. 



L_14: 
sub 

sub 

jmpf 

sub 

add 

_1L_15: 
asle 

asge 

calli 



LP . loc . addr, LP . loc . addr, 

LP . gen . mo ve_s 

LP . wid . s ide_l , LP . wid . s ide_l , 

LP .gen. cover 

LP . wid . s ide_l , L_l 5 

LP .gen.try_s, LP .gen.try_s, 1 

LP . gen . cover, LP . gen . cover, 

LP .wid. side 



V_CLIP_SKIP,LP.gen.try_s, 

LP .gen.max_s 

V_CLIP_STOP, LP . gen . try_s, 

LP.gen.min_s 

ret , GP . pxl . op_vec 



Skip_s_l : 

cpgt Temps, LP .wid. side_l, 
jmpt Temps , L_l 4 

Stop_s_l : 

const LP. gen. cover, 256 

add LP.gen.try_s,LP.gen.s, 



L_1S: 
add 



L_18: 
cpgt 
jmpf 

add 



LP.clp.skip_vec, LP .clp.skip_s, 



Temps , LP . wid . side_2 , 
Temps, Skip_paddLP . loc .addr, 
LP .gen. addr, 
LP .clp.skip_vec, 
LP . clp . skip_s_2 , 
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add LP . clp . stop_vec, 

LP . clp . stop_s_2 , 

Label L_1 9 is the top of the loop that draws pixels on the 
second side (side 2) of the optimum pixel. This is exactly 
the same as side 1 , except that pixels are drawn on the 
other side of the optimum pixel. 



until the line enters the "next" word, memory accesses 
can be amortized over several pixels. This does not help 
with lines that are closer to vertical than 45'', and is not 
used for lines where the primary axis is y. 

The routine begins with the normal global functions. 

Ten parameters are loaded from the stmcture 
G29K Params. 



add 


LP . loc . addr, LP . loc . addr. 


GP.pxl. value 


lr6 






LP.gen.move_s 


GP. mem. width 


lr7 




sub 


LP . wid . s ide_2 , LP . wid . s ide_2 , 


GP. mem. depth 


lr8 






LP. gen. cover 


GP.wnd.base 


lr9 




jmpf 


LP . wid . s ide_2 , L_2 


GP.wnd. align 


IrlO 




add 


LP.gen.try_s,LP.gen.try_s, 1 


GP . pxl . op_vec 


Irll 




add 


LP . gen . cover, LP . gen . cover. 


GP.pxl.in_mask: 


lrl2 






LP. wid. side 


GP . pxl . do_mask 
GP . pxl . do_value 


lrl3 
lrl4 




2L_2 0: 




GP .pxl . out_mask 


lrl5 




asge 


V_CLIP_SKIP, LP .gen.try_s. 


const Tempo, _G2 9K 


_Params + 4 * 


4 




LP .gen.min_s 


consth Tempo, _G2 9K_ 


_Params + 4 * 


4 


asle 


V_CLIP_STOP,LP.gen.try_s, 


mtsrim cr, (10 - 1) 








LP .gen.max_s 


loadm 0, 0, GP.pxl. value, Tempo 




calli 


ret , GP . pxl . op_vec 









Skip_s_2 : 
cpgt 
jmpt 



Temp3 , LP . wid . s ide_2 , 
Temp3,L_19 



Stop_s_2 : 

const LP .gen. cover, 256 

Skip_p: 

jmpf dec LP . gen . count , L_0 9 

add LP . gen . x_error , LP . gen . x_error , 
LP . gen . x_s lope 
Stop_jp: 

P_L4_01.S 

The routine P_L4_01 .8 draws single-width lines In a 
monochrome bit map. The pixels in the bit map are as- 
sumed to be packed 32 to a word. Bit 31 of a word is 
assumed to be displayed to the left of bit 30. Lower- 
addressed words are displayed to the left of, and above, 
higher-addressed words. 

The caller of P_L4__01.S must provide a subroutine to 
perform the actual writes to the bit map. The user routine 
is called with the linear address of the 32-bit word in the 
bit map and a mask with 1 s for the pixels within the word 
that must be written. This is Illustrated for a 16-bit word 
in Figure 9. 

If the line being drawn is close to horizontal, multiple pix- 
els will occupy any given word. By accumulating pixels 



Routine S_iVI1_01 is called to convert the Start.x, Start.y 
pairto a linear address, which is returned in LP.Ioc.addr. 
The bit position within the word is indicated by 
LP.Ioc.align, which is the number of bits to the left of the 
addressed bit within the addressed word. 

add LP .loc. X, Start .X, 

call ret,S_Ml_01 

add LP.loc.y,Start .y, 

cpeq LP .gen. cover, LP .gen. cover, 

LP .gen. cover 
srl LP .gen. cover, LP .gen. cover, 

LP .loc. align 

This routine assumes that the primary direction will be x, 
and the secondary direction will be y. The primary delta 
is calculated by subtracting Start.x from Finish.x. The 
sign bit is left in the error term for reversibly retraceable 
lines. The primary movement value Is set to 4. If Finish.x 
is less than Start.x, the primary delta is negated, and the 
primary movement value is set to negative 4. 



LP . gen . delta_p, Finish . x, Start . x 

LP .gen. error, LP .gen.delta_jp, 31 

LP.gen.delta_p,L_01 

LP . gen . mo ve_p , 4 

LP .gen.delta_p,LP .gen.delta_p, 



sub 

sra 

jmpf 

const 

subr 

constn LP .gen.move_jp, -4 
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0000 0111 1111 1100 

Figure 9. Monochrome Lines 



11011A-09 



At label L_01 , the secondary delta is computed by sub- 
tracting Start.yUom Finish.y. The secondary nx)vement 
value is set to the negative of memory width. If Finish.y \s 
less than Start.y, the secondary delta is negated, and 
the secondary movement value is set to the negative of 
memory width. 



L_01: 
sub 

jmpf 
subr 
subr 
add 

L_02: 
cpge 

jmpt 
cpeq 



const 

L_03: 
sll 
sll 
add 

jmpt 
sub 

jmpt 



LP . gen . delta_s , Finish . y , 

Start.y 

LP . gen . delta__s , L_02 

LP.gen.move_s,GP.mem.width, 

LP . gen . delta_s , LP . gen . delta_s , 

LP .gen.move_s,GP .mem. widths 

TempO , LP . gen . delta_j3, 

LP . gen . delta_s 

TempO,L_03 

LP . gen . t ry_s , LP . gen . t ry_s , 

LP . gen . t ry_s 

LP .gen.delta__p,LP .gen.delta_p, 

LP . gen . delta_s 

LP .gen.delta_s,LP .gen.delta_p, 

LP . gen . delta_s 

LP . gen . deltaj, LP . gen . delta_p, 

LP . gen . delta_s 

LP . gen . mo ve_jp , LP . gen . mo ve__p , 

LP . gen . mo ve_s 

LP . gen . mo ve_s , LP . gen . mo ve_jD , 

LP . gen . mo ve_s 

LP . gen . mo ve_jp , LP . gen . mo ve_p , 

LP .gen.move_s 

LP . gen . t ry_s , 

LP .gen. slope, LP .gen.delta_s, 1 

LP .gen.x__slope,LP .gen.delta^p, 1 

LP .gen. error, LP .gen. error, 

LP .gen. slope 

LP . gen . t ry_s , L__l 1 

LP . gen . error, LP . gen . error, 

LP. gen. delta J 

LP . gen . mo ve_s , L_0 7 



add LP .gen.move_s, LP .gen.move_p, 

LP .gen.move_s 
mtsrim fc,31 

sub LP .gen.count, LP .gen.delta_p, 1 
sub LP .gen. slope, LP .gen. slope, 

LP . gen . x_s lope 

The deltas are compared to test the initial assumption 
that X is the primary direction, if the primary direction is 
y, the primary and secondary deltas and movement val- 
ues must be swapped. A flag is set in LP.gen.try_s\o in- 
dicate the primary axis. This will be used to choose a 
loop. The primary and secondary error increments are 
calculated. 

These error increments are derived from those de- 
scribed in Bresenham's algorithm. The initial error term 
is calculated. The vector can now be drawn. 

There are actually four loops (see Table 7). One will be 
chosen depending on the primary axis, and depending 
on the direction in which the line is to be drawn. 

Table 7. Loop Top and Line Direction 



Primary 



Direction 



Loop Top 



y-axis 


Positive X 


L 04 


y-axis 


Negative x 


L 07 


x-axIs 


Positive y 


L 11 


X-axis 


Negative y 


L_16 



The loop for the case where the primary axis is the y axis 
and movement is in the positive x direction begins at 
label L_04. The funnel-count register is set to 31 so that 
the extract between label L__04 and label L_05 actually 
cycles LP.gen.cover {\he pixel mask) 1 bit to the right. 
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At label L_04, the error is tested and incremented by the 
slope. If movement in the secondary direction is not 
necessary, the jump to L_05 is taken. The pixel is written 
using the current mask (LP.gen.cover), and LP.gen. 
x_slope is added to the error term. At label L_06, the 
loop count is decremented and tested, and the primary 
movement is added to the location. 



L_ 


_04: 






jmpt 


LP .gen. error, L 05 




add 


LP . gen . error, LP . gen . error, 
LP. gen. slope 




calli 


ret , GP . pxl . op_vec 




cpeq 


LP . loc . align, LP . gen . cover, 1 




jmpf 


LP.loc.align,L_0 6 




extract LP . gen . cover, LP . gen . cover. 






LP .gen. cover 




jmpf dec LP .gen. count, L 04 




add 


LP . loc . addr, LP . loc . addr, 
LP.gen.move_s 




jmp 


L_22 




nop 




L_ 


_05: 






calli 


ret , GP . pxl . op_vec 




add 


LP .gen. error, LP .gen. error, 
LP. gen. x_s lope 


L_ 


_06: 






jmpf dec LP . gen . count , L_0 4 




add 


LP . loc . addr, LP . loc . addr, 
LP.gen.move_p 




jmp 


L_22 




nop 





If movement in the secondary direction is needed, the 
current pixel is written, and the mask is compared to 
the value 1 . If the mask is not equal to 1 , it can be right- 
shifted and remain in the same word. If this is the case, 
the jump to L„06 is taken. The loop count is decre- 
mented and tested, and the address is modified by the 
primary nnovement value. 

If the mask is 1 , it is at the right edge of the word, and it is 
necessary to change the address into the next secon- 
dary word. The jump to L_06 is not taken, and the ad- 
dress is modified by the primary nrx)vement value. 

It is not possible to complete this loop without modifying 
the address with one movement value or the other. This 
is reasonable, since no word ever has more than a 
single pixel written. 

The case where the primary direction is y and there Is 
negative movement in x is essentially the same. The 
Funnel Count Registens set to 1 so that the extract is a 
left cycle of 1 bit. 

L_07: 

mtsrim fc,l 

sub LP .gen. count, LP .gen.de It a_p, 1 



L_08: 




jmpt 


LP .gen. error, L_09 


add 


LP . gen . error, LP . gen .error. 




LP. gen. slope 


calli 


ret , GP . pxl . op_vec 


sub 


LP. gen. error, LP .gen. error, 




LP. gen. x_s lope 


jmpf 


LP . gen . cover, L_l 


extract LP . gen . cover, LP . gen . cover. 




LP. gen. cover 


jmpf dec LP . gen . count , L_0 8 


add 


LP . loc . addr, LP . loc . addr. 




LP . gen . mo ve_s 


jmp 


L_22 


nop 




L_09: 




calli 


ret , GP . pxl . op_vec 


nop 





L_10: 

jmpf dec LP . gen . count , L_0 8 

add LP .loc. addr, LP .loc. addr, 

LP .gen.move_p 
jmp L_22 
nop 

Label L_1 1 begins the case where primary movement is 
along the x axis. In this case, multiple pixels can be writ- 
ten into a word. If positive y-axis movement is required, 
the loop at L_12 is used. 

L_ll: 

jmpt LP . gen . mo ve_p , L_l 6 

add LP .loc. align, LP .gen. cover, 

mtsrim fc,31 

sub LP .gen. x_s lope, LP .gen. slope, 

LP . gen . x_slope 

sub LP .gen. count, LP. gen.de It a_p, 1 

jmpt LP .gen. count, L_21 

sub LP .gen. count, LP. gen. count, 1 



L_12: 

jmpt LP . gen 

extract LP .loc 

LP. loc 



calli 
add 



jmpt 
add 



ret,GP 
LP . gen 
LP . gen 
LP . loc 
LP . loc 
LP . gen 
jmpf dec LP . gen 
add LP . gen 
jmp L_2 1 
nop 



. error, L_13 

. align, LP . loc . align, 

.align 

.pxl.op_vec 

.error, LP .gen. error, 

.x_slope 

.align, L_14 

. addr, LP . loc . addr, 

. mo ve_s 

.count, L_12 

. cover, LP . loc . align, 
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The Funnel Count register is set to 31 so that the extract 
just past L_1 2 will be a right cycle of 1 bit. The loop count 
is set to the primary delta minus one; if this is negative, 
only a single pixel is written at L_21 . 

At label L_1 2, the error term Is tested, and LP.Ioc.aligms 
right-cycled 1 bit. This register always has a single 1 , 
and is right-cycled 1 bit for every pass through the loop. 

If LP.gen.errons positive, control passes to L_1 3. Here, 
LP.Ioc.align is tested to determine whether the single 1 
is in bit 31 . The error term is adjusted for a primary 
move. If the bit in LP.Ioc.align is not in position 31 , the 
pixel is still in the same word. Control passes to label 
L_15, where the loop count Is decremented and tested, 
and the new bit is ORed into the mask, LP.gen.cover. 
This continues until either a movement in the secondary 
direction is needed, in which case the write function is 
called just past label L_1 2, or a movement in the primary 
direction is needed, in which case the write function is 
called just past label L_13. 



L_ 


_13: 








jmpf 


LP.loc. 


. align, L_15 




add 


LP . gen . 


.error, LP .gen. error. 






LP . gen . 


. slope 




calli 


ret,GP. 


.pxl.op_vec 




nop 






L_ 


_14: 








add 


LP . loc . 


. addr, LP . loc . addr. 






LP . gen . 


. mo ve__p 




const 


LP . gen . 


.cover, 


L_ 


_15: 








jmpf dec LP . gen . 


.count,L_12 




or 


LP . gen , 


. cover, LP . gen . cover. 






LP.loc, 


.align 




jmp 


L_21 






nop 







Label L_1 6 begins the case where primary movement is 
along the x axis, and negative movement is needed in 
the y axis. It is essentially similarto the code at L_1 2, ex- 
cept that the single 1 cycles to the left rather than the 
right. 

L_16; 

mtsrim fc,l 

sub LP .gen. count, LP .gen. del ta_j>, 1 

jmpt LP . gen . count , L_2 1 

sub LP .gen. count, LP .gen. count, 1 



17: 
jmpt 


LP . gen , 


.error,L_19 


add 


LP . gen , 


.error, LP .gen. error. 




LP . gen , 


.slope 


jmpf 


LP . loc . 


. align,L_18 


sub 


LP . gen , 


. error, LP . gen . error. 




LP . gen , 


.x_slope 


calli 


ret,GP. 


. pxl . op_vec 



LP.loc, 
LP.loc, 
LP . gen , 
LP.loc, 
LP . gen , 
LP . gen , 
LP . gen 
L 21 



align 

addr, LP . loc . addr, 

move_p 

addr, LP . loc . addr, 

move_s 

count, L_17 

cover, LP . loc . align, 



add 

add 

jmpf dec 
add 
jmp 
nop 

L_18: 
calli 
extract 

add 

jmpfdec 
add 
jmp 
nop 

L_19: 

jmpf LP .loc .align, L_20 
extract LP . loc . align, LP . loc . align, 
LP .loc. align 
ret , GP . pxl . op_vec 



ret,GP. 
LP.loc. 
LP.loc, 
LP.loc, 
LP . gen , 
LP . gen , 
LP . gen , 
L 21 



pxl . op_vec 

align, LP . loc . align, 

align 

addr, LP . loc . addr, 

move_s 

count, L_17 

cover, LP . loc . align, 



calli 

nop 

add 

const 



LP . loc . addr, LP . loc . addr, 
LP . gen . mo ve_p 
LP .gen. cover, 



L 20: 






jmpfdec 


: LP . gen , 


. count, L 17 


or 


LP. gen, 


. cover, LP . gen . cover. 




LP.loc, 


.align 


L 21: 






calli 


ret,GP, 


.pxl. op vec 


nop 






s m 01 .s 







extract LP . loc , align, LP . loc . align. 



The routine S_I\/I1_01.S is an internal subroutine that 
calculates the linear address of a pixel from the x and y 
coordinates. It Is not a C-language callable routine. 

Figure 10 is a diagram of the memory system. Location 
zero is shown at the top, and machine addresses in- 
crease from top to bottom. This is also the way the bit 
map is expected to be displayed. 

The routines are written so that higher y coordinates are 
written into lower machine addresses and therefore dis- 
played higher on the screen. Higher x coordinates are 
shown further to the right on the screen. 

The memory width is multiplied times the y address, and 
the result is left in LP.Ioc.addr. It is negated since it must 
be subtracted from the window base in order to appear 
higher on the screen. GP.mem.depth is then tested to 
determine whether the bit map is monochrome. 
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LocO 



GP.m em. width ■ 



LP.Ioc.x 



-1 

LP.Ioc.y 



- wnd.base 



11011A-10 



Figure 10. X-Y Transiation 



If the bit map is not monochrome, the jump Is not 
taken. The x pixel location Is multiplied by the number 
of bytes per pixel (1, 2, 4), and the result Is added 
into LP.Ioc.addr. Then the window base is added In- 
to LP.Ioc.addr, and the routine exits. The variable 
LP.Ioc.align Is left with the low-order 2 bits of the ad- 
dress times eight. This indicates which byte or half-word 
Is being addressed in bit maps with 8 or1 6 bits per pixel, 
in a manner similar to the bit position Indicator for a 
monochrome display. 

If the bit map Is monochrome, the jump to label $01 Is 
taken. The window alignment (offset Into a word) Is 
added to the x location, and the low-order 5 bits are left 
in LP.Ioc.align. This indicates the bit within the word that 
corresponds to the pixel coordinates: the number of bits 
to the left of the addressed bit. The byte address corre- 
sponding to the X location Is added to LP.Ioc.addr, and 
the window base Is added into LP.Ioc.addr. The routine 
exits. 

S_C1_01.S 

There are three routines in file S_C1_01.S; they are 
„S_C1^01 , S_C2_01 , and S_C2_02. 

Routine __S_C1_01 (which is contained in Appendix A) 
loads the clipping trap vectors to point to their handlers. 
The code assumes that the Vector Fetch (VF) bit in the 
Configuration Register (CFG) is 1 , which means that the 
Vector Area Is a table of vectors. 

The address of S_C2_01 Is stored Into absolute location 
decimal 400, and the address of S_C2_02 is stored into 
absolute location decimal 404. Since the R blt(s) are not 
set, the routines are expected to be in instruction/data 
memory. 



When a Clipping Skip Trap occurs, control is transferred 
to routine S_C2_01 . It executes in freeze mode and su- 
pervisor mode. The routine sets pc1 to the value con- 
tained in LP.clp.skip_vec, and sets pcO to the value 
contained In LP.clp.skip_vec + 4. It exits with an iret in- 
struction. This transfers control to the skipping destina- 
tion In the rendering routine. 

When a Clipping Stop Trap occurs, control Is transferred 
to routine S_C2_02. It executes in freeze mode and su- 
pervisor mode. One expects that an operating system 
would normally handle this. The routine sets pc1 to the 
value contained in LP.clp.stop_vec an6 sets pcOXo the 
value contained in LP.clp.stop_vec+4. It exits with an 
IRET Instruction. This transfers control to the stopping 
destination in the rendering routine. 

This method of clipping is most efficient if not much clip- 
ping is done. In the non-clipped case, it requires only 
four cycles for the four assert Instructions for each 
pixel. 

A sklp-if-cllpped exception Is fairly expensive in terms of 
cycles, requiring the following operations for each skip: 

1. ASSERT instnjction. 

2. Fetch vector. 

3. Fetch and execute trap. 

4. Restart at skip destination. 

If many skips are expected. It may be better to replace 
the skip asserts with explicit compare/jump combina- 
tions. In this case, each skip test will require only two 
cycles per pixel, regardless whether clipping occurs. 
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Since the Stop Trap can occur only once per object, it is 
probably most efficient to implement it with an assert 
instruction, as is done here. 

Copy Block Routines 

There are a total of eight functions for Copy Block, which 
are listed in Table 8. 

Tabie 8. Copy Blocl( Routines 



Routine 


Function 


P_B1_01.S 


Color, copy only, no clipping 


P_B1_02.S 


Color, copy only, clipping 


P_B2_01.S 


Color, general operation, no clipping 


P_B2_02.S 


Color, general operation, clipping 


P_B3_01.S 


Monochrome, copy only, no clipping 


P_B3_02.S 


Monochrome, copy only, clipping 


P_B4_01.S 


Monochrome, general operation, no 




clipping 


P_B4_02.S 


Monochrome, general operation, 




clipping 



All Copy Block routines use the Am29000 load- and 
store-multiple instructions, proceeding one scan line at 
a time. As many pixels as will fit into a reserved block of 
registers are fetched with a Load Multiple, and are 
placed in the destination area with a Store Multiple. The 
use of Load and Store Multiple instructions allows the 
memory system to run at maximum speed. If the mem- 
ory system supports burst-mode reads and writes, it is 
possible to load or store a 32-bit word every cycle. 

The size of the block of reserved registers is set by a 
pair of .equ statements in G29K_REG.h. The constant 
MAX_SHIFT must be set to correspond to the constant 
MAX_WORDs, as indicated in Table 9. Values other 
than those in the Table will not work. 





Tables. 


Block Size Values 


IViAX_WORDS 


iVlAX SHIFT 


32 
16 
8 




5 
4 
3 



If MAX_WORDS is set to any value other than 32, the 
Byte Pointer register arrays should be compressed, and 
the BLOCK_PRIMITIVE .equ statement in g29K_reg.h 
should be adjusted to reflect the change. If this is not 
done, there is no reason to change the .equ statements. 

The tradeoff here is between the potential for a spill/fill In 
the case where MAX_WORDS is large, and the poten- 
tial for fewer Load-Multiple/Store-Multiple instructions 
where MAX_WORDS is small. If there are only a few 
scan lines to be moved, then perhaps the extra Load- 
Muitiple/Store-Multiple overhead is not worth the spill/ 
fill. If there are many scan lines to be moved, then the 
spill/fill will be better amortized. 



These routines do not check for overlapping blocks 
and always proceed from lower addresses to higher 
addresses. 

All the copy routines begin with the same declarations. 
The function name is declared to be global, the ENTER 
macro is used to specify that 1 1 1 general registers are 
required, and the routine name appears as a label. 

.global_P_Bl_01 
ENTER BLOCK_PRIMITIVE 
P Bl 01: 



The eight parameter register names are declared with 
PARAM macros. These assign local register numbers 
above (higher) than the registers previously defined. 
These parameters are passed in the local registers 
shown below: 



Macro 


Register Name 


Register Number 


PARAM 


Dest.x 


in 11 


PARAM 


Dest.y 


in 12 


PARAM 


Size.x 


in 13 


PARAM 


Size.y 


in 14 


PARAM 


Source.x 


in 15 


PARAM 


Source.y 


in 16 


PARAM 


Source.b 


in 17 


PARAM 


Source.w 


in 18 



The CLAIM macro is the function prologue. If a spill op- 
eration is not necessary, this requires five instructions. If 
a spill is necessary, this is the standard SPILL routine, 
and may involve a Load/Store Multiple instmction. 

P_B1__01.S 

Routine P_B1_01.S is a C-language callable program 
that performs a copy block operation in a color 
(32-plane) bit map. No clipping is performed. This rou- 
tine is optimized for moving data in deep bit maps. 

The routine begins with the normal global functions. 

Four parameters are loaded from the structure 
G29K_Params. 

GP. mem. width lr7 

GP. mem. depth lr8 

GP.wnd.base lr9 

GP.wnd. align IrlO 

const Tempi, _G29K_Params+ (5*4) 

consth Tempi, _G2 9K_Params+ (5*4) 

mtsrim ct, (4 - 1) 

loadm 0, 0,GP .mem. width. Tempi 

The routine checks to make sure the size of the block is 
non-negative in either dimension. If it is negative, it exits 
immediately. 

sub Size.x, Size.x, 1 

jmpt Size.x, L_0 4 
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sub Size .y, Size. y, 1 
jmpt Size.y,L_04 
sub Size. y, Size. y, 1 

Routine S_M1_01 is called to convert the destination 
address to a linear address. The linear destination ad- 
dress is left in BP.dst.lft_acldr. 

add LP .loc.x^Dest .X, 

call ret,S_Ml_01 

add LP.loc.y,Dest .y, 

add BP.dst .lft_addr,LP.loc.addr, 

Routine S_M1_01 is called a second time to convert the 
source address to a linear address. The source bit-map 
width and base can be different from the global bit-map 
width and base. The linear address of the source is left in 
BP.src.lft_addr. This address and BP.dstlft_addrpo\rX 
to the left edge of the source and destination bit maps 
respectively. They will be modified at the end of each 
scan line by GP.mem.widthan6 Source.w, respectively. 

add Temp 1,GP. mem. width ^ 

add GP .mem. width, Source. w, 

add GP .wnd. base, Source .b, 

add LP. loc.x, Source. X, 

call ret,S_Ml_01 

add LP .loc.y, Source. y, 

add BP . src . If t_addr, LP . loc . addr, 

add GP .mem. width, Tempi, 

The number of groups per scan line is calculated by right 
shifting the x dimension of the block size. The number of 
pixels in the first (or only) group is calculated. 

and BP .f St .count, Size. X, 

(MAX_WORDS - 1) 
srl BP. grp. repeat, size. X, 

MAX_SHIFT 
sub BP . grp . repeat , BP . grp . repeat , 1 
sll BP.fst .incr,BP.fst .count,2 
add BP.fst .incr, BP.fst .incr, 4 

The movement of each scan line begins at label L_01 . 
The current values of the group count and source and 
destination addresses are copied into working registers. 



L_01: 
add 
add 
add 
mtsr 
loadm 
mtsr 
jmpt 



BP . grp . count , BP . grp . repeat , 

BP .dst .addr,BP .dst .lft_addr, 

BP . src . addr, BP . src . If t_addr, 

cr,BP .f St .count 

0, 0,BP .dst . array, BP .src. addr 

cr,BP .f St .count 

BP . grp . count , L_0 3 
storem , , BP . dst . array, BP . dst . addr 
add BP.dst .addr, BP. dst .addr, 

BP .fst .incr 



add 



sub 



BP . src . addr, BP . src . addr, 

BP.fst. incr 

BP . grp . count , BP . grp . count , 1 



The first (or only) group of the scan line is loaded and 
then stored. If the first group is the only group, the code 
jumps to L_03. Othen^/ise the source and destination 
addresses are Incremented by the number of bytes in 
the first group. The group count is decremented. 

The loop that moves the second and subsequent group 
for each scan line begins at label L_02. Each group 
moved in the loop will be the maximum size. The odd 
group, if any, was moved first. The group is loaded into 
the register block and the source address is incre- 
mented. The group is stored from the register block and 
the destination address is incremented. The loop count 
is decremented and tested. 

L_02: 

mtsrim cr, {MAX_WORDS - 1) 

loadm , , BP . dst . array, BP . src . addr 

add BP .src. addr, BP .src. addr, 

(MAX_WORDS * 4) 
mtsrim cr, (MAX_WORDS - 1) 
storem 0, 0,BP .dst .array, BP .dst .addr 
jmpfdecBP .grp. count, L_02 
add BP.dst .addr, BP.dst .addr, 

(MAX_WORDS * 4) 

At label L_03, the pixels for a scan line have all been 
moved. The left edge addresses, BP.dsUft_addr and 
BP.src.lpt_addr are incremented by the width of the 
respective bit maps and the scan line count is decre- 
mented and tested. 

L_03: 

add BP .dst . If t_addr, 

BP . dst . If t_addr , GP . mem . width 
jmpfdecSize.y, L_01 
add BP.src.lft_addr, 

BP . src . If t_addr , Source . w 

P_B1_02.S 

Routine P_B1_02.S is a C-language callable program 
that performs a copy-block operation in a color (32- 
plane) bit map. Clipping is performed. This routine is op- 
timized for moving data in deep bit maps. 

The routine begins with the normal global functions. 

Nine parameters are loaded from the structure 
G29K Params. 



GP .wnd.min_x 
GP .wnd.max_x 
GP .wnd.min_y 
GP .wnd.max__y 
GP .pxl .value 



lr2 
IrS 
lr4 
lr5 
lr6 
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GP. mem. width 


lr7 


GP. mem. depth 


lr8 


GP.wnd.base 


lr9 


GP.wnd. align 


IrlO 



const Templ,_G29K__Params 

consth Templ,_G29K_Params 

mtsrim cr, (9 - 1) 

loadm 0, 0,GP .wnd.min_x, Tempi 

The routine determines the size and location of the des- 
tination region that overlaps the clipping window. 

sub Tempi, Dest.x, GP.wnd. min_x 

jmpf Tempi , L_0 1 

add Temp2 , GP . mem . width , 

add Size. X, Size. X, Tempi 

sub Source. X, Source. X, Tempi 

add Dest .X, GP.wnd. min_x, 

Figure 1 1 shows how the source is cropped so that it be- 
comes the same size as the destination and clipping 
window overlap. 

If necessary, the left edge of the source and destination 
blocks are cropped. Size.x is decremented, Source.x 
is incremented, and Destx is set to the left edge of the 
window. 



_01: 
add 
sub 



Tempi, Dest.x, Size.x 
Tempi, Tempi, 1 



sub Tempi , GP . wnd .max_x, Tempi 

jmpf Templ,L_02 

sub Size.x, Size.x, 1 

add Size.x, Size.x, Tempi 

L_02: 

jmpt Size.x,L_08 

sub Tempi, GP. wnd. max_y, Dest .y 

The right edge of the destination block is cropped to the 
right edge of the clipping window, and the Size.x is re- 
duced if necessary, if the result is less than or equal to 
zero, the routine exits. 

The top edge of the destination block is cropped to the 
top of the clipping window. If necessary, the Source.y 
and S/ze.y are adjusted. 

jmpf 

sub 

add 

add 

add 

L_03: 
sub 
sub 
jmpf 
sub 
add 



Templ,L_03 

Temp3,GP .wnd.min__y, 1 
Size.y, Size. y, Tempi 
Source . y , Source . y, Tempi 
Dest .y, GP .wnd.max_y, 

Tempi, Dest .y, Size.y 
Tempi , Tempi , Temp 3 
Templ,L_04 
Size.y, Size.y, 1 
Size.y, Size .y, Tempi 



Cropped Source 
Beginning 












Window 






Destination 





Cropped 
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Cropped Size 



Figure 11. Cropping 
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The k)ottom edge of the destination block is cropped to 
the bottom of the clipping window. If necessary, the 
Slze.y is adjusted. If the result is less than or equal to 
zero, the routine exits. 

L_04: 

jmpt Size.y,L__08 
sub Size. y^ Size. y^ 1 

Routine S_i\^1__01 is called to convert the destination 
address to a linear address. The linear destination ad- 
dress is left in BP.dstM_addr. 

add. LP.loc.x^Dest .X, 

call ret,S_Ml__01 

add LP .loc .y^Dest .y, 

add BP . dst . If t_addr , LP . loc . addr , 

Routine S_IWi1_01 Is called a second tinne to convert the 
source address to a linear address. The linear address 
of the source is left in variable BP.srcM_addr. This 
address and BP.dstlft_addr point to the left edge of 
the source and destination bit maps, respectively. They 
will be modified at the end of each scan line by 
GP.mem.width an6 Source. w, respectively. 

add GP .mem. width, Source. w, 

add GP .wnd. base, Source. b, 

add LP. loc. X, Source. X, 

call ret,S_Ml__01 

add LP .loc. y, Source. y, 

add BP . src . If t__addr, LP . loc . addr, 

add GP. mem. width, Temp2, 

The number of groups per scan line is calculated by right 
shifting the x dimension of the block size. The number of 
pixels in the first (or only) group is calculated. 

and BP .f St .count, Size. X, 

(MAX__WORDS - 1) 
srl BP .grp. repeat, Size. X, 

MAX__SHIFT 
sub BP . grp . repeat , BP . grp . repeat , 1 
sll BP.fst .incr,BP.fst .count,2 
add BP.fst .incr,BP . f St .incr, 4 

L__05: 

add BP. grp. count, BP. grp. repeat, 

add BP . dst . addr , BP . dst . If t_addr , 

add BP . src . addr, BP . src . If t_addr, 

mtsr cr,BP .f St .count 

loadm 0, 0,BP .dst . array, BP .src. addr 

mtsr cr,BP .f St .count 

jmpt BP .grp.count,L_07 

storem , , BP . dst . array, BP . dst . addr 

add BP. dst .addr, BP .dst .addr, 
BP.fst .incr 



add 



sub 



BP . src . addr, BP . src . addr, 

BP.fst .incr 

BP . grp . count , BP . grp . count , 1 



The movement of each scan line begins at label L_06. 
The current values of the group count and source and 
destination addresses are copied into working registers. 

L_06: 

mtsrim cr, (MAX_WORDS - 1) 

loadm , , BP . dst . array, BP . src . addr 

add BP. src. addr, BP .src. addr, 

(MAX_WORDS * 4) 
mtsrim cr, (MAX_WORDS - 1) 
storem , , BP . dst . array, BP . dst . addr 
jmpf dec BP . grp . count , L_0 6 
add BP.dst .addr, BP. dst .addr, 

(MAX_WORDS * 4) 

The first (or only) group of the scan line is loaded and 
then stored. If the first group is the only group, the code 
jumps to L_08; othenA/ise, the source and destination 
addresses are incremented by the number of bytes in 
the first group. The group count is decremented. 

The loop that nfx)ves the second and subsequent group 
for each scan line begins at label L_07. Each group 
moved in the loop will be the maximum size. The odd 
group is moved first by loading it into the register block 
and incrementing the source address. The group is 
stored from the register block, and the destination ad- 
dress is incremented. The loop count Is decremented 
and tested. 

L_07: 

add BP . dst . If t_addr , 

BP . dst . If t_addr , GP . mem . width 
jmpf dec Size .y, L_05 
add BP.src.lft_addr, 

BP . src . If t_addr , Source . w 
L_08: 

At label L_08, the pixels for a scan line have all been 
moved. The scan line count is decremented and tested. 
If there are more scan lines, the left-edge addresses, 
BP.dst.lft_addr and BP.src.lpt_addr, are Incremented 
by the width of the respective bit maps at label L_05. 

P_B2_01.S 

Routine P_B2__01.S is a C-language callable program 
that performs a general BITBLT operation in a color 
(32-plane) bit map. No clipping is performed. The calling 
program is expected to supply the address of a routine 
that actually combines the source and destination ar- 
rays, according to the desired operation. 

The routine begins with the normal global functions. 



34 



Graphics Primitives 



Ten parameters are loaded from the structure 
G29K_Params. 

GP.pxl. value lr6 

GP .mem. width lr7 

GP. mem. depth lr8 

GP.wnd.base lr9 

GP.wnd. align IrlO 

GP.pxl. op_vec Irll 

GP.pxl. in_mask lrl2 
GP .pxl.do_mask lrl3 
GP .pxl.do_value lrl4 
GP .pxl.out_mask lrl5 
const Tempi, _G2 9K_Params+ (4*4) 
consth Tempi, _G2 9K_Params+ (4*4) 
mtsrim cr, (10 - 1) 
loadm 0, 0,GP -pxl. value, Tempi 

The size of the block is tested to make sure it is greater 
than zero. If it is less than or equal to zero, the routine 
exits immediately. 

sub Size. X, Size. X, 1 

jmpt Size.x,L_04 

sub Size. y, Size .y, 1 

jmpt Size.y,L_04 

sub Size. y, Size .y, 1 

Routine S_iVi1_01 is called to convert the destination 
address to a linear address. The linear destination ad- 
dress is left in BP.clstlft_acldr. 

add LP .loc.x,Dest .X, 

call ret,S_Ml_01 

add LP.loc.y,Dest .y, 

add BP . dst . If t_addr, LP . loc . addr, 

Routine S_M1_01 is called a second time to convert 
the source address to a linear address. The linear ad- 
dress of the source is left in variable BP.src.lft_aclclr. 
This address and BP.dstlft_adclr[X)\n\ to the left edge 
of the source and destination bit maps, respectively. 
They will be modified at the top of each scan line by 
GP.mem.widthan6 Source. w, respectively. 

add Temp 1,GP. mem. widt h, 

add GP .mem. width. Source. w, 

add GP .wnd. base, Source. b, 

add LP . loc. X, Source .X, 

call ret,S_Ml_01 

add LP .loc. y, Source. y, 

add BP . s r c . 1 f t_addr , LP . loc . addr , 

add GP. mem. width, Tempi, 

The number of groups In each scan line and the number 
of pixels in the first group are calculated. If each scan 
line is not an integer number of groups, the odd pixels 
will be done in the first group. 



and BP .f St .count, size. X, 

(MAX_WORDS - 1) 
srl BP .grp. repeat, size. X, 

MAX_SHIFT 
sub BP . grp . repeat , BP . grp . repeat , 1 
sll BP. f St . incr,BP .f St .count, 2 
subr BP .f st .skip,BP . f St .incr, 

( (MAX_WORDS - 1) * 4) 
add BP .fst .incr,BP .f St .incr, 4 

L_01: 

add BP .grp. count, BP .grp. repeat, 

add BP .dst . addr, BP .dst . If t_addr, 

add BP . src . addr, BP . src . If t__addr, 

mtsr cr,BP . f St .count 

loadm , , BP . dst . array, BP . dst . addr 

mtsr cr,BP . fst .count 

loadm 0, 0, BP . src . array, BP . src . addr 

calli ret,GP .pxl .op_vec 

add BP .grp. op_skip,BP .fst . skip, 

mtsr cr,BP .fst .count 

jmpt BP .grp.count, L_03 

storem , , BP . dst . array, BP . dst . addr 

add BP .dst .addr, BP .dst .addr, 

BP. fst .incr 

add BP . s r c . addr, BP . s re . addr , 

BP .fst .incr 

sub BP .grp. count, BP .grp.count, 1 

The top of the loop for each scan line is L_02. The left 
addresses of the source and destination are copied into 
working registers, and the group count is copied into a 
working register. 



L_02 : 
mtsrim 
loadm 
mtsrim 
loadm 
add 

calli 

const 

mtsrim 

storem 

jmpfdec 

add 



cr, (MAX_WORDS - 1) 
, , BP . dst . array, BP . dst . addr 
cr, (MAX_WORDS - 1) 
0, 0,BP .src .array, BP . src .addr 
BP . src . addr, BP . src . addr, 
(MAX_WORDS * 4) 
ret , GP . pxl . op_vec 
BP .grp.op_skip, 
cr, (MAX_WORDS - 1) 
, , BP . dst .array, BP . dst . addr 
BP . grp . count , L_02 
BP . dst . addr , BP . dst . addr , 
(MAX WORDS * 4) 



The source pixels for the first group are moved into the 
source register block. The destination pixels for the first 
group are nrK)ved into the destination-register block. The 
user routine is called to perform the operation on the two 
operands. An example of such a routine is O4_02.S, 
which is included on the distribution diskette. This par- 
ticular routine adds the two operands as 32-bit unsigned 
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numbers. If the result overflows the 32-blt register, it is 
forced to all 1 s. Thus, an add with saturate is done. The 
conditional assembly in O4_02.S avoids register de- 
struction when MAX_WORDS is set to less than 32. 

When the user-supplied routine returns, the data to be 
stored are in the destination-register block, and the 
store takes place. 

The routine determines whether the first group is the 
only group. If so, the scan line is complete and control 
transfers to L_04. If more groups are needed for the 
scan line, the source and destination addresses are ad- 
justed by the amount transferred in the first group and 
the group count is decremented. 

At label L_03, the remaining groups are transferred. The 
destination register block is loaded, the source register 
block is loaded, and the source address is modified. 
Then, the user routine is called, the destination-register 
block is written, and the destination address is modified. 
The group count is decremented and tested, and more 
groups are transferred, if necessary. 

L_03: 

add BP . dst . If t_addr , 

BP . dst . If t_addr , GP .mem. width 
jmpfdecSize.y, L_01 
add BP . s r c . 1 f t_addr , 

BP . src . If t_addr, Source . w 

L_04: 

At label L_04, the scan-line count is decremented and 
tested, and more scan lines are done, if necessary. In 
this case, the source and destination left addresses are 
adjusted at L__01. 

P_B2_,02.S 

Routine P__B2_02.S is a C-language callable program 
that performs a copy block operation in a color 
(32-plane) bit map. Clipping is perfomried. 

The routine begins with the normal global functions. 

Fourteen parameters are loaded from the structure 
G29K_Params. 



GP.wnd.min x 


lr2 


GP .wnd.max x 


lr3 


GP . wnd . min_y 


lr4 


GP.wnd.max_y 


lr5 


GP.pxl. value 


lr6 


GP. mem. width 


lr7 


GP. mem. depth 


lr8 


GP. wnd. base 


lr9 


GP. wnd. align 


IrlO 


GP . pxl . op_vec 


Irll 


GP.pxl. injmask 


lrl2 


GP . pxl . do_mask 


IrlS 


GP . pxl . do_value 


lrl4 



GP.pxl.out_mask lrl5 
const Tempi, _G29K_Params 
consth Tempi, _G29K_Params 
mtsrim cr, (14 - 1) 
loadm 0, 0,GP. wnd. min_x. Tempi 

The routine determines the size and location of the des- 
tination region that overlaps the clipping window. This is 
similar to the code in P_B1_02.S. 

The left edge of the destination block is cropped to the 
left edge of the window. The Size.xis decremented, the 
Source.x is incremented, and Dest.x is set to the left 
edge of the window. 

sub Tempi, Dest .x,GP. wnd. min_x 

jmpf Tempi , L_0 1 

add Temp2,GP .mem. width, 

add Size. X, Size. X, Tempi 

sub Source.x, Source.x, Tempi 

add Dest .x,GP.wnd.min__x, 

The right edge of the destination block is cropped to the 
right edge of the clipping window, and the Size.x Is re- 
duced if necessary. If the result is less than or equal to 
zero, the routine exits. 



_01: 
add 
sub 
sub 
jmpf 
sub 
add 



Tempi, Dest .x, Size.x 

Tempi, Tempi, 1 

Tempi , GP . wnd . max_x, Tempi 

Templ,L_02 

Size .X, Size .x, 1 

Size . X, Size . x, Tempi 



The top edge of the destination block is cropped to the 
top of the clipping window. If necessary, the Source.y 
and Size.y are adjusted so that the beginning of the 
destination and source con'espond to the beginning of 
the window. 



_02: 
jmpt 
sub 
jmpf 
sub 
add 
add 
add 

_03: 
sub 
sub 
jmpf 
sub 
add 



size.x, L_08 

Tempi, GP. wnd. max__y, Dest .y 

Templ,L_03 

Temp 3 , GP . wnd . min_y , 1 

size.y, Size.y, Tempi 

Source . y, Source . y , Tempi 

Dest .y,GP .wnd.max_y, 

Tempi, Dest .y, Size.y 
Tempi , Tempi , Temp3 
Templ,L_04 
size.y, Size.y, 1 
size.y, Size.y, Tempi 
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The bottom edge of the destination block is cropped to 
the bottom of the clipping window. If necessary, S/ze.yis 
adjusted. If the result is less than or equal to zero, the 
routine exits. 

Routine S_M1_01 is called to convert the destination 
address to a linear address. The linear destination ad- 
dress Is left in BP.dst.lft_addr. 



_04: 
jmpt 
sub 
add 
call 
add 
add 



Size.y,L_08 

Size .y, Size .y, 1 

LP . loc.x^Dest .X, 

ret , S_M1_0 1 

LP .loc.y,Dest .y, 

BP . dst . If t_addr , LP . loc . addr , 



Routine S_I\/I1_01 is called a second time to convert 
the source address to a linear address. The linear ad- 
dress of the source Is left in variable BP.src.lft_addr. 
This address and BP.dsUft_addrpo\x\\ to the left edge 
of the source and destination bit maps, respectively. 
They will be modified at the top of each scan line by 
GP.mem.width an6 Source. w, respectively. 

add GP .mem. width. Source. w,0 

add GP .wnd. base, Source .b, 

add LP .loc. X, Source .X, 

call ret,S_Ml_01 

add LP . loc .y, Source .y, 

add BP . src . If t_addr , LP . loc . addr, 

add GP .mem. width, Temp2, 

The number of groups per scan line Is calculated by right 
shifting the x dimension of the block size. The number of 
pixels in the first (or only) group is calculated. 

and BP.f St .count, Size .X, 

(MAX_WORDS - 1) 
srl BP .grp. repeat, Size. X, 

MAX_SHIFT 
sub BP . grp . repeat , BP . grp . repeat , 1 
sll BP.fst .incr,BP.fst .count, 2 
subr BP .f St ,skip,BP .f St .incr, 

( (MAX_WORDS - 1) * 4) 
add BP .f St .incr, BP.f St .incr, 4 

L_05: 

add BP .grp. count, BP .grp. repeat, 

add BP .dst . addr, BP . dst . If t_addr, 

add BP . src . addr, BP . src . If t_addr, 

mtsr cr,BP .f St .count 

loadm , , BP . dst . array, BP . dst . addr 

mtsr cr,BP.fst .count 

loadm 0, 0,BP . src .array, BP .src .addr 

calli ret,GP .pxl.op_vec 

add BP. grp. op_skip, BP.f St .skip, 

mtsr cr,BP.f St .count 

jmpt BP .grp.count,L__07 



storem , , BP . dst . array, BP . dst . addr 
add BP.dst .addr, BP. dst .addr, 

BP .f St .incr 
add BP .src. addr, BP .src. addr, 

BP.f St. incr 
sub BP .grp. count, BP .grp. count, 1 

Label L_06 is the beginning of each scan line. A set of 
working registers is set to the number of groups in the 
scan line, and to the beginning source and destination 
addresses for the scan line. 



L_06: 

mtsrim 

loadm 

mtsrim 

loadm 

add 

calli 

const 

mtsrim 

storem 

jmpfdec 

add 



cr, (MAX_WORDS - 1) 
, , BP . dst . array, BP . dst . addr 
cr, (MAX_WORDS - 1) 
, , BP . src . array, BP . src . addr 
BP . src. addr, BP .src. addr, 
(MAX_WORDS * 4) 
ret , GP . pxl . op_vec 
BP . grp . op_skip, 
cr, (MAX_WORDS - 1) 
, , BP . dst . array, BP . dst . addr 
BP . grp . count , L_0 6 
BP . dst . addr , BP . dst . addr, 
(MAX WORDS * 4) 



One group of the destination operands are moved into 
the destination-register block. One group of the source 
operands are moved into the source-register block. The 
user routine is called to perform the operation on the 
blocks. 

When the user routine returns with the finished destina- 
tion data, the destination field in memory is written. The 
group count is decremented and tested. If this is the only 
group in the scan line, the code continues at label L_08. 
If more groups are required, the left-edge addresses 
are adjusted for the first group, and the group count is 
reduced. 

At label L_07, the remaining groups are moved. For 
each group, the destination and source operands are 
moved into the register block, and the user routine is 
called. When the routine returns, the results are stored, 
and the group count is decremented and tested. 

L_07: 

add BP.dst. If t_addr, 

BP .dst . lft_addr,GP .mem. width 
jmpfdecSize.y, L_05 
add BP .src.lft_addr, 

BP . src . If t_addr, Source . w 

L_08; 

At label L_08, the scan line count is reduced and tested. 
If more scan lines are required, the left-edge addresses 
are adjusted at label L_05. 
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P_B3_01.S 

Routine P_B3_01.S is a C-language callable program 
that performs a copy-block operation in a monochrome 
(1 -plane) bit map. No clipping is performed. This routine 
is optimized for moving data. 

The routine begins with the normal global functions. 

Four parameters are loaded from the stmcture 
G29K Params. 



GP. mem. width 


lr7 


GP .mem. depth 


lr8 


GP.wnd.base 


lr9 


GP.wnd. align 


IrlO 



const Tempi ^_G2 9K_Params+ (5*4) 
consth Tempi, _G2 9K_Params+ (5*4) 
mtsrim cr, (4 - 1) 
loadm 0, 0,GP .mem. width, Tempi 

The routine checks to be sure that the size of the block 
is non-negative in either dimension. If so, it exits 
immediately. 

cple BP .grp. repeat, Size. X, 

jmpt BP . grp . repeat , L_l 6 

sub Size. y, Size. y, 1 

jmpt Size.y,L_16 

sub Size. y, Size. y, 1 

Routine S_iVI1_01 is called to convert the destination 
address to a linear address. The linear destination ad- 
dress Is left in BP.dst.lft_addr, and the alignment is left In 
LP.Ioc.align. 

add LP .loc.x,Dest .X, 

call ret,S_Ml_01 

add LP.loc.y,Dest .y, 

add BP . dst . If t_addr , LP . loc . addr , 

add BP .dst .align, LP. loc. align, 

Routine S„iVI1_01 is called a second time to convert 
the source address to a linear address. The linear ad- 
dress of the source is left in variable BP.src.lft_addr. 
This address and BP.dsUft_addr po'mX to the left edge 
of the source and destination bit maps, respectively. 
They will be modified at the top of each scan line by 
GP.mem.width and Source.w, respectively. 

add BP .grp. count, GP. mem. width, 

add GP .mem. width, Source. w, 

add GP .wnd. base, Source. b, 

add GP. wnd. align, Source. a, 

add LP. loc. X, Source. x,0 

call ret,S_Ml_01 

add LP. loc. y, Source. y, 

add GP. mem. width,BP. grp. count, 

add BP.src.lft addr, LP . loc. addr, 



srl 


BP.dst 




BP.dst 


add 


BP .grp 




Size.x 


add 


BP . grp 


subr 


BP . grp 


and 


BP . grp 


sll 


BP.dst 




BP.dst 



The amount that the source must be shifted to align with 
the destination is calculated and left in BP.src.shift. 

sub BP.src.shift, LP .loc. align, 
BP.dst .align 

Masks are generated for the left and right ends of the 
destination field. These will be used at the beginning 
and end of each scan line to avoid affecting partial words 
not actually Inside the destination. 

constn BP .dst .rgt_mask, -1 
. lft_mask, 

.rgt_mask,BP .dst .align 
. align, BP . dst . align, 

. repeat, BP .grp. align, 31 

. align, BP . grp . align, 32 

. align, BP . grp . align, 31 

.rgt_mask, 

. rgt_mask,BP .grp. align 

The number of groups in each scan line is calculated. 
Each group except the first will contain exactly 32 words; 
any "extra" words will be in the first group. 

srl BP . grp . repeat , BP . grp . repeat , 5 
sub BP . grp . repeat , BP . grp . repeat , 1 
and BP .f St .count, BP .grp. repeat, 

(MAX_WORDS - 1) 
srl BP .grp. repeat, BP .grp. repeat, 

MAX_SHIFT 
sub BP . grp . repeat , BP . grp . repeat , 1 

sll BP.fst .incr,BP.fst .count,2 
subr BP.fst .skip, BP.fst .incr, 

(4 * (MAX_WORDS - 1) ) 
const BP.fst.shift,L_0 9 
consth BP.fst.shift,L_0 9 
add BP . f st . shif t, BP . f st . shift, 

BP.fst .skip 
setip BP .dst .array, BP .src. array, 

BP.src. array 
mfsr BP .src.rgt_ptr, ipa 
add BP .src.rgt_ptr,BP .src.rgt_ptr, 

BP.fst .incr 
add BP .fst .incr,BP .f st .incr, 4 
cpgt BP .grp. align, BP .src .shift, 
add BP.src. shift, BP.src. shift, 32 
and BP .src. shift, BP .src. shift, 31 
cpeq BP .src. save, BP .src. shift, 
or BP.src. shift, BP.src. shift, 

BP .src .save 

For the first group, there is a sequence of shift instruc- 
tions beginning at L_09. Since the first group may not 
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contain all 32 words, an Indirect jump address into the 
sequence Is generated and left In BP.fstshift. The Indi- 
rect address pointers are set to the source array and 
destination array. The address of the right-nrx>st word of 
the first scan line of the source array is computed. 

L_05: 

add BP .grp. count rBP .grp. repeat, 

add BP .dst . addr, BP .dst . If t_addr, 

add BP . src . addr, BP . src . If t_addr, 

load , , BP . dst . If t_end, BP . dst . addr 

jmpf BP . grp . count , L_0 6 

andn BP . dst . If t_end, BP . dst . If t_end, 

BP . dst . If t_mask; 

add Tempi , BP . dst . addr, BP . f st . incr 

sub Tempi , Tempi , 4 

load 0, 0,BP. dst .rgt_end, Tempi 

andn BP . dst . rgt_end, BP . dst . rgt_end, 

BP . dst . rgt_mask: 

The sign of the relative alignment is placed into 
BP.grp.align; it will be used to determine whether to do a 
left or right shift. BP.src.shift is set to the value to be 
loaded into the funnel count register for use with the ex- 
tract instructions when shifting the source registers. The 
value is appropriate for either left or right shifting. If the 
value of the shift anrwunt is zero (no shifting is needed), 
then the sign bit is set for use with a conditional jump 
prior to the sequence of extract instructions. 

Each scan line begins at L_06. Copies of the source and 
destination addresses of the edge of the scan lines and 
the group count are moved into working registers. 

L_0 6: 

jmpf BP .grp. align, L_07 

mtsr ipa,BP.src.rgt_ptr 

add BP. src. extra, BP.f St .count, 1 

mtsr cr,BP. src. extra 

loadm 0, 0,BP . src. extra, BP .src .addr 

jmp L_08 

add BP .src. addr, BP. src. addr, 4 

The word at the left end of the first group of the destina- 
tion (possibly an Incomplete word) is fetched, and the 
portion outside the destination is retained for storing to 
memory. If the first group is the only group, the word at 
the right end is fetched and masked as well. 

The alignment direction flag in BP.grp.align Is tested. A 
right shift results in a jump to L_07. If a left shift Is neces- 
sary, an extra word Is loaded into a register that is pre- 
fixed to the source register block. The remaining words 
of the source array are loaded into the register block. 



At label L_08, the right-most source word of the first 
group is saved to be prefixed to the next group. 



L_08: 
jmpt 
add 
jmpi 
mtsr 



BP . src . shift, L_10 
BP . src . save, grO , 
BP.f St. shift 
fc, BP.src.shift 



L_07: 
mtsr 
loadm 



cr , BP . f St . count 

, , BP . src . array, BP . src . addr 



If the source and destination are identically aligned, no 
shifts are necessary and the code jumps over the shift 
array to label L_1 0. This is the case regardless of the ab- 
solute alignment of the operands, which Is handled by 
the special treatment of words at the left and right ends 
of the scan line. 

L_09: 

.if MAX_WORDS == 32 
extract BP .src.array_31, 

BP . src . array_30, BP . src . array_31 



extract BP . src . array_16, 

BP .src.array_15,BP .src.array_16 
.endif 

.if MAX_WORDS >= 16 
extract BP .src .array_15, 

BP.src.array_14,BP . src .array_15 



extractBP. src.array_08, 

BP . src.array_07,BP . src .array_08 
.endif 

.if MAX_WORDS >= 8 
extractBP . src . array_07 , 

BP .src.array_0 6,BP .src .array_07 
extract BP . src . array_0 6 , 

BP . src . array_05, BP . src . array_0 6 
extractBP .src .array_05, 

BP . src . array_0 4 , BP . src . array_05 
extractBP .src.array_04, 

BP.src.array_03,BP . src.array_04 
.endif 

extract BP . src . array_03, 

BP . src . array_02 , BP . src . array_03 
extract BP . src . array_02 , 

BP .src.array_01,BP .src .array_02 
extractBP .src.array_01, 

BP.src.array_00,BP. src .array__01 
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extract BP . src . array_00 , 

BP . src .extra, BP . src . array_00 

If the shift is necessary, the code jumps somewhere Into 
the sequence of shift instructions. Each Instruction 
either left- or right-shifts two adjacent registers In the 
source block and leaves the result in the register that is 
further to the right. 

At label L_1 0, the bits outside the destination (in the left- 
most word) are placed into the left-most word of the 
source array. If there is only one group, the bits outside 
the destination (in the right-most word) are placed Into 
the right-most word of the source an-ay. 

L_10: 

and BP .src. array, BP .src. array, 

BP.dst .lft_mask 
jmpf BP .grp. count, L_ll 
or BP .src .array, BP .src .array, 

BP.dst.lft_end 
mtsr ipa,BP.src.rgt_ptr 
mtsr ipc,BP .src.rgt_j5tr 
nop 

and grO,grO,BP .dst .rgtjtnask 
or grO,grO,BP .dst .rgt_end 

At label L_1 1 , the resulting register block is written into 
the destination bit map. If there is only a single group per 
scan line, code jumps to L_15. 

L_ll: 

mtsr cr,BP . f St .count 

jmpt BP .grp.count,L_15 

storem , , BP . src . array, BP . dst . addr 

add BP . dst . addr , BP . ds t . addr , 

BP.fst .incr 
add BP .src. addr, BP. src. addr, 

BP.fst .incr 
sub BP. grp. count, BP .grp. count, 1 

If there is more than one group per scan line, the ad- 
dresses are adjusted by the amount moved in the first 
group, and the code enters a loop at L_1 2 to move the 
rest of the scan line 32 words at a time. 

At label L_12, the code determines whether this is the 
last group. If so, the right-most word Is fetched from the 
destination. The bits outside the destination block are 
preserved. 

L__12 : 

jmpf BP .grp.count,L_12a 

add BP. src. extra, BP .src. save, 

add Tempi, BP.dst .addr, 

(4 * (MAX_WORDS - 1)) 

load 0, 0, BP.dst .rgt_end. Tempi 

andn BP . dst . rgt_end, BP . dst . rgt_end. 



BP . dst . rgt_mask 

At label L_12a, the right-most word from the previous 
group is prefixed, and then the 32 words are fetched 
from the source bit map. The source address is incre- 
mented, and the right-most word is saved for the next 
group. 

L_12a: 

mtsrim cr, (MAX_WORDS -1) 

loadm , , BP . src . array, BP . src . addr 

add BP .src. addr, BP .src. addr, 

(4 * MAX_WORDS) 

jmpt BP.src.shift,L_13 

add BP . src . save, BP . src . array_end, 

mtsr fc,BP .src. shift 

.if MAX_WORDS == 32 
extract BP . src . array_31, 

BP . src . array_30, BP . src . array_31 



extractBP . src .array_16, 

BP .src.array_15,BP . src .array_16 
.endif 

.if MAX_WORDS >= 16 
extractBP .src.array_15, 

BP .src.array_14,BP .src .array_15 



extractBP .src .array_08, 

BP . src . array_07 , BP . src . array_08 
.endif 

.if MAX_WORDS >= 8 
extractBP .src .array_07, 

BP .src.array_06,BP . src . array_07 
extractBP .src.array_06, 

BP .src.array_05,BP .src.array_06 
extractBP .src.array_05, 

BP .src.array_04,BP .src .array_05 
extractBP .src.array_04, 

BP .src.array_03,BP .src .array_04 
.endif 

extractBP .src .array_03, 

BP .src .array_02,BP . src .array_03 
extractBP .src .array_02, 

BP .src .array_01,BP . src .array_02 
extractBP .src.array_01, 

BP .src.array_00,BP .src .array_01 
extractBP .src.array_00, 

BP . src . extra, BP . src . array_00 
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If no shift is necessary (the source and destination are 
identically aligned), the shift array is skipped. If a shift is 
necessary, it occurs. 

At label L_1 3, the right-most destination word is fetched, 
masked, and merged, if this is the last group. 

L_13: 

jmpf BP.grp.count,L_14 
mtsrim cr, (MAX_WORDS - 1) 
and BP .src.array_end, 

HP . src . array_end, 

HP . dst . rgt_mask 
or BP.src.array_end, 

BP . src . array_endr 

BP . dst . rgt_end 

At label L_1 4, the group is written into the destination bit 
map and the destination address is modified. The group 
count is decremented and tested. If further groups are 
necessary for the scan line, they are moved beginning at 
labelL_12. 

L_14: 

storem 0, O^BP .src. array, BP .dst .addr 
jmpf dec BP . grp . count , L_12 
add BP.dst .addr,BP.dst .addr, 
(4 * MAX_WORDS) 

L_15; 

add BP .dst . If t_addr, 

BP .dst . If t_addr,GP .mem. width 
jmpf dec Size . y, L_05 
add BP .src.lft_addr, 

BP .src.lft_addr, Source. w 
L_16: 

The scan-line count is decremented and tested. If fur- 
ther scan lines are necessary, the address of the left 
edge of each is calculated at label L_05. 

P_B3_02.S 

Routine P__B3_02.S is a C-language callable program 
that performs a copy-block operation in a monochrome 
(1 -plane) bit map. Clipping is performed. This routine is 
optimized for moving data. The routine begins with the 
normal global functions. 

Nine parameters are loaded from the structure 
G29K Params. 



GP.wnd.min x 


lr2 


GP.wnd.max x 


lr3 


GP . wnd . min_y 


lr4 


GP.wnd.max__y 


lr5 


GP .pxl. value 


lr6 


GP. mem. width 


lr7 


GP. mem. depth 


lr8 


GP. wnd. base 


lr9 


GP. wnd. align 


IrlO 



const Templ,_G2 9K_Params 
consth Templ,_G29K_Params 
mtsrim cr, (9 - 1) 
loadm 0, Or GP .wnd. min_x, Tempi 

The destination array is cropped so that It consists only 
of the array originally within the destination array and 
within the clipping window. 

The left edge of the destination array is cropped, if nec- 
essary, to the left edge of the clipping window. If the left 
edge must be cropped, the left edge of the source array 
and the block size are adjusted as well. 

sub Tempi, Dest .x,GP .wnd. min_x 

jmp f Temp 1 , L_0 1 

add Temp 3 , GP . wnd . ma x_x , 1 

add Size .X, Size. X, Tempi 

sub Source. X, Source. X, Tempi 

add Dest .x,GP.wnd.min_x, 

The right edge of the destination array Is cropped, If nec- 
essary, to the right edge of the clipping window. If the 
right edge must be adjusted, the block size is adjusted 
as well. 



L_01: 
add 
sub 
jmpf 
add 
add 



Tempi, Dest .x, Size .x 

Tempi, Temp 3, Tempi 

Templ,L_02 

Temp2 , GP . mem . width , 

Size .X, Size .X, Tempi 



If the width of the resulting block is less than or equal to 
zero, the routine exits immediately. 

If necessary, the top edge of the destination array is 
cropped to the top edge of the clipping window. If the top 
edge must be adjusted, the source top and the block 
size are adjusted as well. 



L_02: 
cple 
jmpt 
sub 
jmpf 
sub 
add 
add 
add 



Tempi, Size.x, 
Templ,L_16 

Templ,GP.wnd.max__y,Dest .y 
Templ,L_03 

Temp3,GP .wnd.min_y, 1 
Size.y, Size. y, Tempi 
Source . y, Source . y , Tempi 
Dest . y , GP . wnd . max_y , 



The bottom edge of the destination array is cropped, if 
necessary, to the bottom edge of the clipping window. 
If the bottom edge must be adjusted, the block size is 
adjusted as well. 



L_03: 
sub 
sub 



Tempi, Dest .y, Size .y 
Tempi , Tempi , Temp 3 
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jmpf Tempi , L_0 4 

sub Size, y, Size. y, 1 

add Size. y, Size. y, Tempi 

If the height of the resulting block is less than or equal to 
zero, the routine exits immediately. 



_04: 
jmpt 
sub 



Size.y,L_16 
Size.y, Size.y,- 1 



Routine S_M1_01 is called to convert the destination 
coordinates to a linear address and alignment. The re- 
sults are left in BP.dst.lft_adclr an6 BP.dst.align. 

add LP .loc.x,Dest .X, 

call ret,S_Ml_01 

add LP.loc.y,Dest .y, 

add BP . dst . If t_addr, LP . loc . addr, 

add BP , dst .align, LP .loc .align, 

Routine S„M1_01 is called a second time to convert the 
source coordinates to a linear address and alignment. 

add GP .mem. width, Source .w, 

add GP .wnd. base, Source. b, 

add GP. wnd. align, Source. a, 

add LP. loc. X, Source. X, 

call ret,S_Ml_01 

add LP. loc. y, Source. y, 

add GP.mem.width,Temp2, 

add BP . src . If t_addr, LP . loc . addr, 

The difference in alignment between the source and 
destination is calculated to determine how far the source 
must be shifted. This is left in BP.src.shift. 

sub BP .src .shift, LP .loc .align, 

BP.dst.align 

The masks for the left and right ends of a destination line 
are formed. These will be used to mask bits in the words 
at the ends of each scan line that are not in the destina- 
tion block. 



constn 
srl 

add 

add 
subr 
and 
sll 



BP.dst. 
BP.dst. 
BP.dst. 
BP . grp . 
Size.x 
BP . grp . 
BP . grp . 
BP . grp . 
BP.dst, 
BP.dst, 



rgt_mask,-l 

lft_mask, 

rgt_mask, BP . dst . align 

align, BP .dst .align, 

repeat, BP. grp. align, 31 

align, BP .grp. align, 32 

align, BP .grp. align, 31 

rgt_mask, 

rgt__mask, BP . grp . align 



The number of groups in each scan line is conrputed. 
The number of words in the first or only group of each 
scan line is computed. All other groups of each scan line 
will be exactly MAX_WORDS (32) words. 

srl BP . grp . repeat , BP . grp . repeat , 5 
sub BP . grp . repeat , BP . grp . repeat , 1 
and BP. f St .count, BP. grp. repeat, 

(MAX_WORDS - 1) 
srl BP. grp. repeat, BP .grp. repeat, 

MAX_SHIFT 
sub BP . grp . repeat , BP . grp . repeat , 1 
sll BP.fst.incr,BP.fst.count,2 
subr BP .fst .skip,BP .f St .incr, 

(4 * (MAX_WORDS - 1) ) 
const BP.fst.shift,L_0 9 
consth BP.fst.shift,L_0 9 
add BP. f St. shift,BP. fst. shift, 

BP. fst .skip 
setip BP .dst .array, BP .src .array, 

BP .src .array 

mfsr BP .src.rgt_j)tr, ipa 

add BP -src. rgt__ptr,BP .src . rgt_ptr, 

BP .fst .incr 
add BP.fst .incr,BP.fst .incr, 4 
cpgt BP. grp. align, BP. src. shift^O 
add BP.src.shift, BP.src.shift, 32 
and BP.src.shift, BP.src.shift, 31 
cpeq BP .src. save, BP.src.shift, 
or BP .src. shift, BP .src .shift, 

BP .src. save 

L_05: 

add BP .grp. count, BP .grp. repeat, 

add BP . dst . addr, BP . dst . If t_addr, 

add BP . src . addr, BP . src . If t_addr, 

load , , BP . dst . If t_end, BP . dst . addr 

jmpf BP .grp.count,L_0 6 

andn BP . dst . If t__end, BP . dst . If t_end, 

BP . dst . If t_mask 

add Tempi , BP . dst . addr, BP . fst . incr 

sub Tempi, Tempi, 4 

load 0, 0, BP.dst .rgt_end, Tempi 

andn BP . dst . rgt_end, BP . dst . rgt_end, 

BP .dst . rgt_mask 

For the first group, there is a sequence of shift instruc- 
tions beginning at L_09. Since the first group may not 
contain all 32 words, an indirect jump address into the 
sequence is generated and left in BP.fstshift. The indi- 
rect address pointers are set to the source array and 
destination array. The address of the right-most word of 
the first scan line of the source array is computed. 
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The sign of the relative alignment Is placed into 
BP.grp.align; It will be used to determine whether to do a 
left or right shift. BP.src.shift is set to the value to be 
loaded into the funnel count register for use with the ex- 
tract instructions when shifting the source registers. The 
value is appropriate for either left or right shifting. If the 
actual value of the shift amount is zero — ^that is, no shift- 
ing is needed — ^then the sign bit is set for use with a con- 
ditional jump prior to the extract instructions sequence. 

Each scan line begins at L_06. Copies of the source and 
destination addresses of the edge of the scan lines and 
the group count are moved Into working registers. 

L_06: 

jmpf BP .grp.align,L_07 

mtsr ipa^BP .src. rgt_ptr 

add BP . src. extra, HP .f St .count, 1 

mtsr cr,BP .src .extra 

loadm 0, 0,BP . src. extra, BP .src.addr 

jmp L_08 

add BP . src.addr, BP .src .addr, 4 

The word at the left end of the first group (which may be 
an incomplete word) of the destination is fetched, and 
the portion outside the destination is retained and even- 
tually written back into memory. If the first group is the 
only group, the word at the right end is fetched and 
masked as well. 

The alignment direction flag in BP.grp.align Is tested. A 
right shift results In a jump to L_07. If a left shift is neces- 
sary, an extra word Is loaded into a register that is pre- 
fixed to the source register block. The remaining words 
of the source array are loaded into the register block. 

L_07 : 

mtsr cr,BP .f St .count 

loadm 0, 0, BP . src. array, BP .src .addr 

At label L_08, the right-most source word of the first 
group is saved to be prefixed to the next group. 

L_08: 

jmpt BP.src.shift,L_10 

add BP .src. save, grO, 

jmpi BP.f St. shift 

mtsr fc,BP .src. shift 

If the source and destination are identically aligned, no 
shifts are necessary and the code jumps over the shift 
array to label L_1 0. This is the case regardless of the ab- 
solute alignment of the operands, which is handled by 
the special treatment of words at the left and right ends 
of the scan line. 

L_09: 

.if MAX_WORDS == 32 
extract BP . src . array_31, 

BP . src . array_30, BP . src . array_31 



extract BP . src.array_16, 

BP . src .array_15,BP . src .array_16 
.endif 

.if MAX_WORDS >= 16 
extract BP . src . array__15 , 

BP . src.array_14, BP . src .array_15 



extract BP . src .array_08, 

BP.src.array_07,BP . src .array_08 
.endif 

.if MAX_WORDS >= 8 
extract BP .src .array_07, 

BP .src.array_0 6,BP . src .array_07 
extract BP .src .array_0 6, 

BP.src.array_05,BP .src .array_0 6 
extract BP .src .array_05, 

BP .src .array_04,BP .src .array_05 
extract BP .src .array_04, 

BP . src .array_03,BP . src .array_04 
.endif 

extract BP . src. array_03, 

BP .src.array_02,BP .src .array__03 
extract BP .src .array_02, 

BP .src .array_01,BP . src .array_02 
extract BP .src .array_01, 

BP . src .array_00,BP . src.array_01 
extract BP . src .array_00, 

BP .src. extra, BP . src .array__00 

If the shift is necessary, the code jumps somewhere into 
the sequence of shift Instructions. Each instruction 
either left- or right-shifts two adjacent registers in the 
source block and leaves the result in the register that is 
further to the right. 

At label L_1 0, the bits outside the destination (in the left- 
most word) are placed into the left-most word of the 
source array. If there is only one group, the bits outside 
the destination (in the right-most word) are placed into 
the right-most word of the source array. 

L_10: 

and BP .src. array, BP .src. array, 

BP . dst . If t_mask 
jmpf BP . grp . count , L_l 1 
or BP .src .array, BP .src .array, 

BP.dst.lft_end 
mtsr ipa,BP .src.rgt_ptr 
mtsr ipc,BP.src.rgt_ptr 
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nop 
and 
or 



grO , gr , BP . dst . rgt_mask 
gr , gr ^ BP . dst . rgt_end 



At label L_1 1 , the resulting register block is written into 
the destination bit map. If there is only a single group per 
scan line, the code jumps to L_15. 



L_ll: 

mtsr cr^BP.f St .count 

jmpt BP .grp.count, L_15 

storem , , BP . src . array, HP . dst . addr 

add BP .dst .addr, BP. dst .addr, 

BP.fst .incr 
add BP .src. addr, BP .src. addr, 

BP.fst .incr 
sub BP .grp. count, BP .grp. count, 1 

If there is more than one group per scan line, the ad- 
dresses are adjusted by the anrx)unt moved in the first 
group, and the code enters a loop at L_1 2 to move the 
rest of the scan line 32 words at a time. 

At label L__12, the code tests to find out if this Is the last 
group. If so, the right-most word is fetched from the des- 
tination. The bits outside the destination block are pre- 
served. 



L_12 : 
jmpf 
add 
add 

load 
andn 



BP . grp . count , L_12a 
BP. src. extra, BP. src. save, 
Tempi, BP. dst .addr, 
(4 * (MAX_WORDS - 1) ) 
, , BP . dst . rgt_end, Tempi 
BP . dst . rgt__end, BP . dst . rgt_end, 
BP . dst . rgt_mask 



At label L„12a, the right-most word from the previous 
group Is prefixed, and then the 32 words are fetched 
from the source bit map. The source address Is incre- 
mented and the right-most word is saved for the next 
group. 

L_12a: 

mtsrim cr, (MAX_WORDS - 1) 

loadm 0, 0,BP .src .array, BP . src .addr 

add BP. src. addr, BP. src. addr, 

(4 * MAX__WORDS) 
jmpt BP.src.shift,L_13 
add BP . src . save, BP . src . array_end, 
mtsr fc,BP. src. shift 
.if MAX__WORDS == 32 
extract BP .src .array__31, 

BP .src.array_30,BP. src.array_31 



extract BP . src.array_16, 

BP .src.array_15,BP . src .array_16 
.endif 

.if MAX_WORDS >= 16 
extract BP .src .array_15, 

BP .src.array_14,BP.src.array_15 



extract BP .src .array_08, 

BP . src . array_07 , BP . src . array_08 

.endif 

.if MAX_WORDS >= 8 

extract BP . src .array_07, 

BP .src.array_06,BP.src.array__07 
extract BP .src .array_0 6, 

BP .src.array_05,BP . src .array_0 6 
extract BP .src .array_05, 

BP .src.array_04,BP . src .array_05 
extract BP .src .array_04, 

BP .src.array_03,BP . src .array_04 
.endif 

extract BP .src .array_03, 

BP . src . array_02, BP . src . array_03 
extract BP . src . array_02 , 

BP .src .array_01,BP.src.array_02 
extract BP . src . array_01, 

BP . src . array_00, BP . src . array_01 
extract BP . src . array_00, 

BP.src . extra, BP .src .array_00 

If no shift is necessary (the source and destination are 
identically aligned), the shift array is skipped. A shift oc- 
curs, if necessary. 

At label L_1 3, the right-most destination word is fetched, 
masked, and merged, if this is the last group. 

L_13: 

jmpf BP . grp . count , L_l 4 
mtsrim cr, (MAX_WORDS - 1) 
and BP .src.array_end, 

BP . src . array__end, 

BP . dst . rgt_mask 
or BP .src.array_end, 

BP . src . array__end, 

BP . dst . rgt_end 

At label L_14, the group is written into the destination bit 
map, and the destination address is modified. The group 
count is decremented and tested. If further groups are 
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necessary for the scan line, they are moved beginning at 
labelL_12. 

L_14: 

storem , , BP . src . array, BP . dst . addr 
jmpf dec BP . grp . count , L_12 
add BP .dst .addr, BP .dst .addr, 
(4 * MAX_WORDS) 



L_15: 

add BP . dst . If t_addr , 

BP .dst .lft_addr,GP .mem. width 
jmpf dec Size . y, L_05 
add BP .src.lft_addr, 

BP . src . If t_addr, Source . w 

L_16: 

The scan-line count is decremented and tested. It fur- 
ther scan lines are necessary, the address of the left 
edge of each is calculated at label L_05. 

P_B4__01.S 

Routine P_B4_01.S is a C-language callable program 
that performs a general BITBLT operation in a mono- 
chrome (1 -plane) bit map. No clipping is performed. 

The caller is responsible for supplying the address of a 
routine that combines the two operands after they have 
been moved into the source and destination register 
blocks, and afterthe source operand has been shifted to 
align with the destination. An example of such a routine 
Is O5_01.S, which is included on the distribution disk- 
ette. This routine XORs the source array into the desti- 
nation array. 

The routine begins with the normal global functions. 

Nine parameters are loaded from the structure 
G29K_Params. 

GP. mem. width lr7 

GP. mem. depth lr8 

GP.wnd.base lr9 

GP.wnd. align IrlO 

GP .pxl.op_vec Irll 

GP .pxl.in_mask lrl2 
GP .pxl.do_mask lrl3 
GP .pxl.do__value lrl4 
GP .pxl.out_mask lrl5 
const Tempi, _G29K_Params+ (5*4) 
consth Tempi, __G2 9K_Params+ (5*4) 
mtsrim cr, (9 - 1) 
loadm 0, 0,GP .mem. width, Tempi 

The routine checks to be sure that the size of the block is 
not negative or zero in either dimension. If so, it exits 
immediately. 



cple Tempi, Size .X, 

jmpt Templ,L_12 

sub Size .y, Size .y, 1 

jmpt Size.y,L_12 

sub Size .y, Size .y, 1 

From here on, the routine is exactly like P_B3_01 .S, ex- 
cept that the user routine is called to perform the opera- 
tion on the two register blocks, and the labels have been 
changed. 

Routine S_M1_01 is called to convert the destination 
address to a linear address. The linear destination ad- 
dress is left in BP.dstJft_addran6 the alignment is left in 
LPJoc.align. 

add LP . loc.x,Dest .X, 

call ret,S_Ml_01 

add LP .loc .y,Dest .y, 

add BP.dst .lft_addr,LP.loc.addr, 

add BP .dst . align, LP . loc .align, 

Routine S_M1_01 is called a second time to convert the 
source address to a linear address. The linear address 
of the source is left in variable BP.src.lft_addr. This 
address and BP.dstlft_addr point to the left edge of 
the source and destination bit maps, respectively. They 
will be modified at the top of each scan line by 
GP.mem.width an6 Source. w, respectively. 

add BP .grp. count, GP .mem. width, 

add GP .mem. width, Source .w, 

add GP .wnd. base, Source .b, 

add GP .wnd. align, Source .a, 

add LP .loc. X, Source. X, 

call ret,S_Ml_01 

add LP .loc. y, Source. y, 

add GP .mem. width, BP .grp. count, 

add BP . src . If t_addr, LP . loc . addr, 

The amount that the source must be shifted in order to 
align with the destination is calculated and left in 
BP.src.shift. 

sub BP .src .shift, LP .loc. align, 
BP .dst .align 

Masks are generated for the left and right end of the des- 
tination field. These will be used at the beginning and 
end of each scan line to avoid affecting partial words not 
actually inside the destination. 

constn BP .dst .rgt_mask, -1 
srl BP .dst .lft_mask, 

BP .dst .rgt_mask,BP .dst .align 
add BP .grp. align, BP .dst .align, 

Size.x 
add BP .grp. repeat, BP, grp. align, 31 
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subr BP .grp. align, BP .grp. align, 32 
and BP .grp. align, BP .grp. align, 31 
sll BP .dst .rgt_mask, 

BP . dst . rgt_mask, BP . grp . align 

The number of groups in each scan line is calculated. 
Each group except the first will contain exactly 32 words; 
any "extra" words will be in the first group. 

For the first group, there Is a sequence of shift instruc- 
tions beginning at L_05. Since the first group may not 
contain ail 32 words, an Indirect jump address into the 
sequence is generated and left in BP.fstshift. The indi- 
rect address pointers are set to the source array and 
destination array. The address of the right-most word of 
the first scan line of the source array is computed. 

sll BP . f St . incr, BP . f st . count , 2 
subr BP .f St .skip,BP .fst .incr, 

(4 * (MAX_WORDS - 1) ) 
const BP. fst .shift, L_05 
consth BP. fst .shift, L_05 
add BP.fst .shift, BP. fst. shift, 

BP .fst .skip 
setip BP .dst .array, BP .src. array, 

BP .src .array 
mfsr BP .dst .rgt__ptr, ipc 
add BP.dst .rgt_ptr,BP.dst .rgt_j>tr, 

BP .fst .incr 
mfsr BP . src . rgt_ptr, ipa 
add BP . src . rgt_ptr, BP . src . rgt_ptr, 

BP.fst .incr 
add BP .fst .incr, BP .fst . incr, 4 

The sign of the relative alignment is placed into 
BP.grp.align; it will be used to determine whether to do a 
left or right shift. BP.srashift is set to the value to be 
loaded into the funnel count register for use with the ex- 
tract instructions, when shifting the source registers. 
The value is appropriate for either left or right shifting. 
If the actual value shift amount is zero—that is, no shift- 
ing is needed—then the sign bit is set for use with a 
conditional jump prior to the sequence of extra ct 
instructions. 

cpgt BP .grp. align, BP .src .shift, 
add BP .src. shift, BP .src. shift, 32 
and BP. src. shift,BP. src. shift, 31 
cpeq BP .src. save, BP .src. shift, 
or BP .src. shift, BP. src .shift, 
BP . src .save 

Each scan line begins at L_01 . Copies of the source and 
destination addresses of the edge of the scan lines and 
the group count are moved into working registers. 



add BP.src.addr,BP ,src . lft_addr, 

mtsr cr,BP .fst .count 

loadm , , BP . dst . array, BP . dst . addr 

mtsr ipa,BP .dst .rgt_ptr 

andn BP . dst . If t_end, BP . dst . array, 

BP . dst . If t_mask 

jmpf BP .grp.align,L_03 

andn BP .dst .rgt_end,grO, 

BP . dst . rgt_mask 

add BP. src. extra, BP .fst .count, 1 

mtsr cr,BP .src .extra 

loadm 0, 0,BP .src . extra, BP .src. addr 

jmp L_04 

add BP. src. addr, BP. src. addr, 4 

The word at the left end of the first group (which may be 
an incomplete word) of the destination is fetched, and 
the portion outside the destination is retained and even- 
tually written back into memory. If the first group is the 
only group, the word at the right end is fetched and 
masked as well. 

The alignment direction flag in BP.grp.align is tested. A 
right shift results in a jump to L_03. If a left shift is neces- 
sary, an extra word is loaded into a register that is pre- 
fixed to the source register block. The remaining words 
of the source array are loaded into the register block. 

L_03: 

mtsr cr,BP .fst .count 

loadm 0, 0,BP . src. array, BP .src. addr 

At label L_04, the right-most source word of the first 
group is saved to be prefixed to the next group. 



L_04: 
mtsr 
jmpt 
add 
jmpi 
mtsr 



ipa , BP . s re . rgt__pt r 
BP.src.shift,L_0 6 
BP . src. save, grO, 
BP.fst. shift 
fc,BP .src. shift 



L_01: 
add 
add 



BP . grp . count , BP . grp . repeat , 
BP.dst. addr, BP.dst. 1ft addr, 



If the source and destination are Identically aligned, no 
shifts are necessary, and the code jumps over the shift 
array to label L_06. This is the case regardless of the ab- 
solute alignment of the operands, which is handled by 
the special treatment of words at the left and right ends 
of the scan line. 

L_05: 

.if MAX_WORDS == 32 
extract BP . src . array_31 , 

BP .src.array_30,BP .src .array_31 



extract BP .src.array_16, 

BP . src . array_15 , BP . src . array_l 6 
.endif 
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.if MAX_WORDS >= 16 
extractBP.src.array_15, 

BP . src . array_14, HP . src . array_15 



extract BP . src . array_0 8 , 

BP . src . array_07 , BP . src . array_08 
.endif 

.if MAX_WORDS >= 8 
extract BP .src.array_07, 

BP .src.array_06,BP .src.array_07 
extract BP .src .array_06, 

BP .src.array_05,BP .src.array_06 
extract BP . src .array_05r 

BP .src.array_04,BP .src .array_05 
extract BP . src .array_04^ 

BP .src.array_03,BP .src.array_04 
.endif 

extract BP . src.array_03, 

BP . src . array_02 , BP . src . array_03 
extract BP . src . array_02, 

BP . src . array_01, BP . src . array_02 
extract BP . src . array_01, 

BP.src.array_00,BP .src.array_01 
extract BP . src . array_00, 

BP. src. extra, BP .src.array_00 

If the shift Is necessary, the code jumps somewhere Into 
the sequence of shift Instructions. Each Instruction 
either left- or right-shifts two adjacent registers In the 
source block and leaves the result In the register that Is 
further to the right. 

At label L_06, the user-supplied routine Is called to per- 
form the operation. The word at the left end of the desti- 
nation array is restored. If there Is only one group, the 
bits outside the destination (in the right-most word) are 
placed into the right-most word of the source array. 



L_06: 
calli 
add 
and 

jmpf 
or 

mtsr 

mtsr 

nop 

and 

or 



ret f GP . pxl . op_vec 

BP.grp.op_skip,BP .fst .skip, 

BP . dst . array, BP . dst . array, 

BP . dst . If t_mask 

BP . grp . count , L_0 7 

BP . dst . array, BP . dst . array, 

BP.dst.lft_end 

ipa, BP . dst . rgt_ptr 

ipc , BP . dst . rgt_pt r 

grO , grO , BP . dst . rgt_mask 
grO , grO , BP . dst . rgt_end 



At label L_07, the resulting register block Is written into 
the destination bit map. If there is only a single group per 
scan line, the code jumps to L_1 1 . 

L_07: 

mtsr cr,BP. fst .count 

jmpt BP .grp. count, L_ll 

storem , , BP . dst . array, BP . dst . addr 

add BP.dst .addr, BP. dst .addr, 

BP.fst .incr 
add BP. src. addr, BP .src. addr, 

BP .fst .incr 
sub BP .grp. count, BP .grp. count, 1 

If there is more than one group per scan line, the ad- 
dresses are adjusted by the anrx)unt moved in the first 
group, and the code enters a loop at L_08 to process the 
rest of the scan line. 32 words at a time. 

At label L_08, the 32 words of the destination array are 
loaded into the destination-register block, and the right- 
end word Is saved. 

L_08: 

mtsrim cr, (MAX_WORDS - 1) 
loadm 0, 0, BP.dst .array, 

BP . dst . addr 

BP . dst . rgt_end, 

BP .dst . array_end, 

BP . dst . rgt_mask 

BP .src. extra, BP .src. save, 
mtsrim cr, (MAX_WORDS - 1) 
loadm 0, 0,BP .src. array, 

BP .src. addr 

BP . src . addr, BP . src . addr, 

(4 * MAX_WORDS) 

BP.src.shift,L_09 

BP . src . save, BP . src . array_end, 

fc,BP. src. shift 



andn 



add 



add 

jmpt 

add 

mtsr 



.if MAX_WORDS == 32 
extract BP . src . array_31, 

BP .src.array_30,BP . src.array_31 



extract BP . src .array_16, 

BP .src.array_15,BP .src .array_16 
.endif 

.if MAX_WORDS >= 16 
extract BP .src.array_15, 

BP . src . array__14, BP . src . array_15 
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ext rac t BP . s r c . a r r ay_0 8 , 

BP . src . array_07 , BP . src . array_08 
.endif 

.if MAX_WORDS >= 8 

extract BP .src.array_07, 

BP . src . array_06, BP . src . array_07 

extractBP .src.array_06,- 
BP . src .array_05,BP .src .array_06 

extract BP .src .array_05, 

BP . src . array_04, BP . src . array_05 

extract BP . src .array_04, 

BP .src .array_03,BP.src.array_04 

.endif 

extract BP .src .array_03, 

BP .src .array_02,BP.src.array_03 
extractBP . src .array_02, 

BP .src.array_01,BP.src.array_02 
extractBP . src .array_01, 

BP . src . array_00 , BP . src . array_01 
extractBP .src .array_00, 

BP .src .extra, BP .src.array_00 

The right-most word from the previous group is prefixed, 
and then the 32 words are fetched from the source bit 
map. The source address is incremented, and the right- 
most word is saved for the next group. 

If no shift is necessary (the source and destination are 
identically aligned), the shift array is skipped. If a shift is 
necessary, it takes place. 

At label L_09, the user routine is called to perform the 
operation. The right-most destination word is masked 
and merged if this is the last group. 

L_09: 

calli ret,GP .pxl.op_vec 
const BP.grp.op__skip, 
jmpf BP .grp. count, L_10 
mtsrim cr, (MAX__WORDS - 1) 
and BP . dst . array_end, 

BP . dst . array_end, 

BP . dst . rgt_mask 
or BP .dst .array_end, 

BP . dst . array_end, 

BP.dst .rgt_end 

At label L_1 0, the group is written into the destination bit 
map, and the destination address is modified. The group 
count is decremented and tested. If further groups are 
necessary for the scan line, they are moved beginning at 
label L_08. 

L_10: 

storem , , BP . dst . array, BP . dst . addr 

jmpf dec BP . grp . count , L_0 8 

add BP.dst .addr, BP.dst .addr. 



(4 * MAX WORDS) 



L_ll: 
add 



BP.dst. If t_addr, 

BP . dst . If t_addr, GP . mem. width 
jmpf dec Size.y,L_01 
add BP.src.lft_addr, 

BP . src . If t_addr , Source . w 
L_12: 

The scan line count Is decremented and tested. If further 
scan lines are necessary, the address of the left edge of 
each is calculated at label L_01 . 

P_B4__02.S 

Routine P_B4_02.S is a C-language callable program 
that performs a copy-block operation in a monochrome 
(1 -plane) bit map. Clipping Is performed. 

The caller is responsible for supplying the address of a 
routine that combines the two operands after they have 
been moved Into the source and destination register 
blocks, and the source operand has been shifted to align 
with the destination. An example of such a routine is 
O5__01.S, which is included on the distribution diskette. 
This routine XORs the source array Into the destination 
array. 

The routine begins with the normal global functions. 
Fourteen parameters are loaded from the structure 
G29K Params. 



GP.wnd.min x 


lr2 


GP.wnd.max x 


IrS 


GP.wnd.min_y 


lr4 


GP .wnd.max_y 


lr5 


GP.pxl. value 


lr6 


GP. mem. width 


lr7 


GP. mem. depth 


lr8 


GP.wnd.base 


lr9 


GP.wnd. align 


IrlO 


GP.pxl. op_vec 


Irll 


GP.pxl. in_mask 


lrl2 


GP.pxl. do_mask 


Iris 


GP . pxl . do_value 


lrl4 


GP . pxl . out_mask 


lrl5 



const Tempi, _G2 9K_Params 
consth Templ,_G29K_Params 
mtsrim cr, (14 - 1) 
loadm 0, 0, GP.wnd. min_x. Tempi 

From this point on, the routine is just like P_B3_02.S, 
except that the user routine is called to perform the op- 
eration on the operands. 

The destination array is cropped so that it consists only 
of the array originally within the destination array and 
within the clipping window. The left edge of the destina- 
tion array is cropped. If necessary, to the left edge of the 
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clipping window, if the left edge must be cropped, the left 
edge of the source array and the block size are adjusted 
as well. 

sub Templ,Dest .x,GP.wnd.min_x 

jmpf Tempi , L_0 1 

add Temp3 , GP . wnd . max_x, 1 

add Size. X, Size. X, Tempi 

sub Source. X, Source. X, Tempi 
add Dest .x,GP .wnd.min_x, 

If necessary, the right edge of the destination an-ay is 
cropped to the right edge of the clipping window. If the 
right edge must be adjusted, the block size is adjusted 
as well. 



L_01: 
add 
sub 
jmpf 
add 
add 



Tempi, Dest.x, Size.x 

Tempi , Temp3 , Tempi 

Templ,L_02 

Temp2 , GP . mem . width, 

Size . X, Size . x, Tempi 



If the width of the resulting block is less than or equal to 
zero, the routine exits immediately. 



L_02: 
cple 
jmpt 
sub 



Tempi, Size.x, 

Templ,L_16 

Tempi, GP . wnd.max_y,Dest .y 



The top edge of the destination an^ay is cropped, if nec- 
essary, to the top edge of the clipping window. If the top 
edge must be adjusted, the source top and the block 
size are adjusted as well. 

jmpf Templ,L_03 

sub Temps, GP .wnd. min_y, 1 

add Size. y, Size. y, Tempi 

add Source. y, Source. y, Tempi 

add Dest .y,GP .wnd.max_y, 

The bottom edge of the destination array Is cropped, if 
necessary, to the bottom edge of the clipping window. If 
the bottom edge must be adjusted, the block size is ad- 
justed as well. 



L_03: 
sub 
sub 
jmpf 
sub 
add 



Tempi, Dest .y, Size.y 
Tempi , Tempi , Temp 3 
Templ,L_04 
Size .y, Size .y, 1 
Size .y, Size .y, Tempi 



If the height of the resulting block is less than or equal to 
zero, the routine exits immediately. 

L_04: 

jmpt Size.y,L_16 



sub Size.y, Size.y, 1 

Routine S_M1_01 is called to convert the destination 
coordinates to a linear address and alignment. The re- 
sults are left In BP.dst.lft_acldr and BP.dstalign. 

add LP.loc.x,Dest .X, 

call ret,S_Ml_01 

add LP.loc.y,Dest .y, 

add BP . dst . If t_addr, LP . loc . addr, 

add BP. dst .align, LP .loc. align, 

Routine S_M1_01 is called a second time to convert the 
source coordinates to a linear address and alignment. 
The base and width of the source bit map may be differ- 
ent from those of the destination. 

add BP. grp. count, GP .mem. width, 

add GP. mem. width. Source. w, 

add GP .wnd. base, Source. b, 

add GP .wnd. align. Source. a, 

add LP. loc. X, Source. X, 

call ret,S_Ml_01 

add LP .loc. y, Source. y, 

add GP. mem. width,BP. grp. count, 

add BP . src . If t_addr , LP . loc . addr, 

The difference in alignment between the source and 
destination is calculated to detemriine howf arthe source 
must be shifted. This Is left in BP.src.shift. 

sub BP .src. shift, LP .loc. align, 
BP .dst .align 

The masks for the left and right ends of a destination line 
are formed. These will be used to mask bits in the words 
at the ends of each scan line that are not in the destina- 
tion block. 

. rgt_mask, -1 
.lft_mask, 

.rgt_mask,BP .dst .align 
. align, BP . dst . align, 

. repeat, BP .grp. align, 31 

. align, BP . grp . align, 32 

. align, BP .grp. align, 31 

.rgt_mask, 

. rgt__mask, BP . grp . align 

The number of groups in each scan line and the number 
of words in the first or only group of each scan line are 
computed. All other groups of each scan line will be ex- 
actly MAX_WORDS (32) words. 

srl BP . grp . repeat , BP . grp . repeat , 5 

sub BP . grp . repeat , BP . grp . repeat , 1 

and BP .fst .count, BP .grp. repeat. 



constn 


BP.dst 


srl 


BP.dst 




BP.dst 


add 


BP . grp 




Size.x 


add 


BP . grp 


subr 


BP . grp 


and 


BP . grp 


sll 


BP.dst 




BP.dst 
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(MAX_WORDS - 1) 
srl BP.grp. repeat, BP.grp. repeat, 

MAX_SHIFT 
sub BP . grp . repeat , BP . grp . repeat , 1 

For the first group, there is a sequence of shift instruc- 
tions beginning at L_09. Since the first group may 
not contain all 32 words, an indirect jump address into 
the sequence is generated and left in BP.fstshift. The 
indirect address pointers are set to the source array and 
destination array. The address of the right-most word of 
the first scan line of the source array is computed. 

sll BP.fst .incr,BP.fst .count,2 
subr BP.fst .skip,BP.fst .incr, 

(4 * (MAX_WORDS -1)) 
const BP . f St . shift , L_0 9 
consth BP.fst .shift,L_0 9 
add BP . f St . shif t , BP . f st . shift, 

BP.fst .skip 
setip BP .dst .array, BP.src. array, 

BP . src. array 
mfsr BP.dst .rgt_j3tr, ipc 
add BP.dst .rgt_j)tr,BP.dst .rgt_ptr, 

BP.fst .incr 
mfsr BP .src.rgt jtr, ipa 
add BP .src.rgt_ptr,BP .src .rgt_ptr, 

BP .f St .incr 
add BP.fst .incr, BP.fst .incr, 4 

The sign of the relative alignment is placed into 
BP.grp.align; it will be used to determine whether to do a 
left or right shift. 

cpgt BP .grp. align, BP. src. shift, 
add BP. src. shift, BP. src. shift, 32 
and BP. src. shift,BP. src. shift, 31 
cpeq BP. src. save, BP. src. shif t, 
or BP .src. shift, BP. src. shift, 
BP . src . save 

Each scan line begins at L_05. Copies of the source and 
destination addresses of the edge of the scan lines and 
the group count are moved into working registers. 



L_05: 
add 
add 
add 
mtsr 
loadm 
mtsr 
andn 

jmpf 
andn 



BP , grp . count , BP . grp . repeat , 

BP . dst . addr , BP . dst . If t_addr , 

BP .src . addr, BP . src . If t_addr, 

cr, BP . f St . count 

, , BP . dst . array, BP . dst . addr 

ipa , BP . ds t . rgt_j5t r 

BP . dst . If t_end, BP . dst . array, 

BP.dst .lft_mask 

BP .grp.align,L_07 

BP . dst . rgt_end, grO , 

BP . dst . rgt_mask 



add BP. src. extra, BP.fst .count, 1 

mtsr cr,BP .src. extra 

loadm , , BP . s r c . ext ra , BP . s re . addr 

jmp L_0 8 

add BP. src. addr, BP. src. addr, 4 

The word at the left end of the first group (which may be 
an incomplete word) of the destination is fetched, and 
the part outside the destination is saved and eventually 
written back into memory. If the first group is the only 
group, the word at the right end is fetched and masked 
as well. 

The alignment direction flag in BP.grp.align Is tested. A 
right shift results in a jump to L_07. If a left shift is neces- 
sary, an extra word is loaded into a register that is pre- 
pended to the source register block. The remaining 
words of the source array are loaded into the register 
block. 



L_07: 
mtsr 
loadm 



cr,BP .fst .count 

0, 0,BP. src. array, BP. src. addr 



At label L_08, the right-most source word of the first 
group is saved to be prefixed to the next group. 



L_08: 
mtsr 
jmpt 
add 
jmpi 
mtsr 



ipa , BP . s re . rgt_j>t r 
BP . src . shift, L_10 
BP .src .save,grO, 
BP.fst. shift 
fc,BP. src. shift 



If the source and destination are identically aligned, no 
shifts are necessary, and the code jumps over the shift 
array to label L_1 0. This is the case regardless of the ab- 
solute alignment of the operands, which is handled by 
the special treatment of words at the left and right ends 
of the scan line. If the shift is necessary, the code jumps 
somewhere into the sequence of shift instructions. Each 
instruction either left- or right-shifts two adjacent regis- 
ters in the source block and leaves the result in the regis- 
ter that is further to the right. 

L_09: 

.if MAX_WORDS == 32 
ext ract BP . src . array_31 , 

BP .src.array_30,BP .src .array_31 



extract BP . src .array_16, 

BP . src . array_15 , BP . src . array_l 6 
.endif 

.if MAX_WORDS >= 16 
extract BP . src . array_15, 

BP . src .array_14,BP . src .array_15 
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extract BP .src 

BP .src 
.endif 

.if MAX_WORDS 
extract BP .src 

BP .src 
extract BP .src 

BP.src 
extract BP .src 

BP.src 
extract BP .src 

BP.src 
.endif 

extract BP . src 

BP .src 
extract BP .src 

BP .src 
extract BP .src 

BP .src 
extract BP .src 

BP .src 



. array_08, 

. array_07,BP .src .array_08 

>= 8 

, array_07, 

. array_0 6,BP . src . array_07 

. array_0 6, 

, array_05,BP .src.array_06 

. array_05, 

. array_04,BP . src .array_05 

. array_04, 

. array_03,BP . src .array_04 

. array_03, 

. array_02,BP . src . array_03 

. array_02, 

. array_01,BP . src .array_02 

. array_01, 

. array_00,BP . src .array_01 

. array__00, 

. extra, BP .src .array_00 



At label L_10, the operation routine is called. The bits 
outside the destination (in the left-most word) are placed 
into the left-nnost word of the source array. If there Is only 
one group, the bits outside the destination (in the right- 
most word) are placed into the right-most word of the 
source array. 



L_10: 
calli 
add 
and 

jmpf 
or 

mtsr 

mtsr 

nop 

and 

or 



ret , GP . pxl . op_vec 

BP .grp.op_skip,BP .f st .skip, 

BP . dst . array, BP . dst . array, 

BP . dst . If t_mask 

BP . grp . count , L_l 1 

BP . dst . array, BP . dst . array, 

BP.dst.lft_end 

ipa , BP . ds t . rgt_j5t r 

ipc,BP .dst .rgt^ptr 

grO , grO , BP . dst . rgt_mask 
gr , grO , BP . dst . rgt_end 



At label L_1 1 , the resulting register block Is written into 
the destination bit map. If there is only a single group per 
scan line, the code jumps to L_15. 

L__ll: 

mtsr cr,BP .f St .count 

jmpt BP .grp.count,L_15 

storem , , BP . dst . array, BP . dst . addr 

add BP.dst .addr, BP. dst .addr, 

BP .fst .incr 

add BP.src. addr, BP.src. addr. 



BP .fst .incr 
sub BP .grp. count, BP .grp. count, 1 

If there Is more than one group per scan line, the ad- 
dresses are adjusted by the anrvDunt moved in the first 
group, and the code enters a loop at L_1 2 to move the 
rest of the scan line 32 words at a time. 

At label L_12, the destination words are fetched. The 
bits to the right of the destination block are preserved in 
case this is the last group. 



L_12: 
mtsrim 
loadm 

andn 



add 

mtsrim 
loadm 
add 

jmpt 

add 

mtsr 



cr, (MAX_WORDS - 1) 
0, 0, BP.dst .array, 
BP.dst .addr 
BP .dst . rgt_end, 
BP . dst . array_end, 
BP . dst . rgt_mask 
BP .src .extra, BP . src . save, 
cr, (MAX_WORDS - 1) 
, , BP . src . array, BP . src . addr 
BP .src. addr, BP .src. addr, 
(4 * MAX_WORDS) 
BP.src.shift,L_13 
BP .src. save, BP .src .array_end, 
fc,BP .src. shift 



.if MAX_WORDS == 32 
extract BP . src .array_31, 

BP .src .array_30, BP .src .array_31 



extract BP .src.array_16, 

BP .src .array_15,BP . src .array_16 
.endif 

.if MAX_WORDS >= 16 
extract BP .src .array_15, 

BP .src.array_14,BP . src .array_15 



extract BP.src. 


. array_08. 








BP.src. 


. array_07,BP . 


. src , 


. array_ 


08 


.endif 










.if MAX_WORDS 


>= 8 








extract BP.src, 


.array_07. 








BP.src, 


. array_0 6,BP. 


.src , 


. array_ 


_07 


extract BP . src , 


. array_0 6. 








BP .src , 


.array 05, BP. 


.src. 


.array_ 


_0 6 


extract BP . src , 


. array_05. 








BP.src, 


. array_04,BP, 


.src. 


.array_ 


_05 


extract BP.src 


.array_04. 








BP.src 


.array_03,BP, 


.src, 


. array_ 


_04 


.endif 










extract BP.src 


.array_03. 








BP.src 


.array_02,BP 


.src 


. array_ 


_03 
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extract BP . src .array_02, 

BP . src . array_01, BP . src . array_02 
extract BP .src .array_01, 

BP .src.array_00,BP.src.array_01 
extract BP .src .array_00, 

BP .src.extra,BP.src.array_00 

The right-most word from the previous group is prefixed, 
and then 32 words are fetched from the source bit map. 
The source address is incremented, and the right-most 
word Is saved for the next group. 

If no shift is necessary (because the source and destina- 
tion are identically aligned) , the shift array Is skipped, if a 
shift Is necessary, it takes place. 

At label L_13, the user routine Is called to perform the 
operation. If this is the last group, the right-end word Is 
restored. 

L_13: 

calli ret^GP .pxl.op_vec 
const BP .grp.op_skip^ 
jmpf BP.grp.count,L_14 
mtsrim cr, (MAX_WORDS - 1) 
and BP . dst . array_end, 

BP . dst . array_end, 

BP . dst . rgt_mask 
or BP .dst .array_end, 

BP . dst . array_end, 

BP.dst .rgt_end 

At label L_1 4, the group is written into the destination bit 
map and the destination address is modified. The group 
count is decremented and tested. If further groups are 
necessary for the scan line, they are moved beginning at 
labelL_12. 

L_14: 

storem , , BP . dst . array, BP . dst . addr 
jmpf dec BP . grp . count , L_12 
add BP.dst .addr, BP.dst. addr, 
(4 * MAX_WORDS) 

L_15: 

add BP . dst . If t_addr , 

BP .dst . If t_addr, GP .mem. width 
jmpfdecSize.y,L_05 
add BP.src.lft_addr, 

BP . src . If t_addr, Source . w 
L 16: 



Text Routines 

There are two C-language callable routines for text op- 
erations: P_T1_01.S, which does not perform clipping, 
and P__T1_02.S, which does. 

The rasterized characters must be stored In memory be- 
fore the text routines can be called. The first word of 
each character form specifies its size. The following 
words contain the bit patterns for the character. The bits 
begin with the top row, left to right, and continue with fol- 
lowing rows, left to right. The bits are packed into just as 
many words as are necessary to contain them. The only 
unused bits are the least-significant bits of the last word. 
The first word contains five fields, as shown in Table 1 0. 

Tabie 10. Bit Assignments for First Word of 
Rasterized Character Format 

Value Value in 
Bits Function Range Figure 12 



31-21 


Height In scan lines 


0...63 


9 


25-20 


Width in pixels 


0...63 


7 


19-14 


Inset to left side 


-32.. .31 


1 


13-07 


Ascent to top 


-32...95 


8 


oe~oo 


Pitch to next char 


0...127 


9 



In Figure 1 2, the hex bit patterns used to form the char- 
acter 'A' were 0x1 020E1 C6, 0xCDBFE3C6. 

The two text routines begin with the same declarations. 
The function name Is declared to be global, the ENTER 
macro is used to specify that 48 general registers are re- 
quired, and the routine name appears as a label. 

.global_P_Tl_01 
ENTER TEXT_PRIMITIVE 
_P_T1_01: 

The macros used to form the words are in routine 
TEST_T1.C 

The three parameter register names are declared with 
PARAM macros. These assign local register-numbers 
higher than (or above) the registers previously defined. 
These parameters are passed in the local registers 
shown below: 



Macro Register Name 

PARAM POS.X 

PARAM Pos.y 

PARAM Form 



Register Number 

Ir50 
Ir51 
Ir52 



The scan line count Is decremented and tested. If further 
scan lines are necessary, the address of the left edge of 
each is calculated at label L 05. 
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Character 
Position 



Next Character Position 



Figure 12. Cliaracter Parameters 



Pos.x and Pos.y together indicate where the character 
is to be drawn. The routine updates Pos.x according to 
the pitch parameter of the character drawn. Form is the 
address of the character definition. 

The CLAIM macro Is the function prologue. If a spill 
operation is not necessary, this consists of five instruc- 
tions. If a spill is necessary, the standard SPILL routine 
is used, which may involve a Load/Store Multiple 
instruction. 

P_T1_01.S 

Routine P_T1_01.S draws the indicated character into 
a color bit map at the specified location. No clipping is 
performed. The routine begins with the normal global 
functions. 

Five parameters are loaded from the structure 
G29K Params. 



GP .pxl .value 
GP .mem. width 
GP .mem. depth 
GP .wnd.base 
GP.wnd. align 



lr6 
lr7 
lr8 
lr9 
IrlO 



const TempO,_G29K_Params + (4 
consth Tempo, _G2 9K_Params + (4 
mtsrim cr, (5 - 1) 



4) 
4) 



loadm 0, 0,GP .pxl. value, Tempo 

The first word of the character form is loaded into TempO 
and divided into its five components. The variables 
TP.char.inset ar\6 TP.char.ascent are each offset by 32 
and must be adjusted. 

load 0, 0, Tempo, Form 

srl TP .chr. high, Tempo, 2 6 

srl TP .chr. wide, Tempo, 20 

and TP .chr .wide, TP .chr .wide, 63 

srl TP .chr .inset, Tempo, 14 

and TP .chr .inset, TP .chr . inset, 63 

sub TP. chr. inset, TP. chr. inset, 32 

srl TP .chr .ascent, TempO, 7 

and TP .chr .ascent, TP .chr .ascent, 127 

sub TP . chr . ascent , TP . chr . ascent , 32 

and TP.chr. pitch, Tempo, 127 

Routine S_M1_01 is called to convert the destination 
coordinates to a linear address and alignment. The inset 
Is added to the x position and the ascent is added to the y 
position. 

add LP. loc. X, Pos .X, TP.chr .inset 

call ret,S_Ml_01 

add LP .loc .y, Pos .y,TP .chr .ascent 
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sll 


TP . ptn . next , TP . chr . 
GP. mem. depth 


.wide. 


sub 


TP . ptn . next , GP . mem . 
TP .ptn. next 


.width. 


const 


TP . ptn . shif t_set , 
(0x80000000 + 30) 




consth 


TP . ptn . shif t_set , 
(0x80000000 + 30) 




sub 


TP .ptn. high, TP. chr, 


.high, 2 


sub 


TP . ptn . wide , TP . chr 


.wide, 2 



Variable TP.ptn.next is set to the value necessary to in- 
crement from the last pixel of a scan line in the character 
cell to the first pixel of the next scan line. Variable 
TP.ptn.shift_set\s initialized to count bits in the pattern 
words, and is loaded into TP.ptn.shift_counteach{\n\e a 
new pattern word is fetched. The initial value is negative. 
TP.ptn.high and TP.ptn.wide are set to TPchr.high-2, 
and TP.chr.wide - 2, respectively. 

The character-drawing loop begins at label L_01. The 
pointer in Form is moved to the first pattern word, and 
the word is loaded into TP.ptn.mask. TP.ptn.shift_count 
is renewed from TP.ptn.shift_set, and the code jumps to 
L 03. 



L_01: 
add 
load 
jmp 
add 



Form, Form, 4 

0, 0,TP .ptn. mask, Form 

L_03 

TP .ptn. shif t_count, 

TP. ptn. shif t_set, 



L_02: 

jmpfdecTP .ptn . shift_count, L_01 
sll TP .ptn. mask, TP.ptn.mask, 1 

At label L_03, the high-order bit of the current mask 
word is tested. If it is a zero, the code jumps to L_04. if it 
is necessary to write both a foreground and background 
color, one would add code to write a background color 
instead of just jumping to L_04. If the bit in the mask reg- 
ister is a 1 , the current pixel color is written into the cur- 
rent pixel location. 

L_03: 

jmpf TP .ptn.mask,L_04 

nop 

store 0, 0,GP.pxl. value, LP. loc.addr 

At label L_04, variable TP.ptn.wide Is decremented and 
tested. If it does not become negative, the destination 
scan line need not change. The pixel location is incre- 
mented to the next pixel, and the code returns to label 
L_02, where it tests for bits remaining in the current 
mask word. 

L_04: 

jmpf dec TP. ptn. wide, L_0 2 



add LP .loc.addr, LP .loc.addr, 

PIXEL_SIZE 
add LP. loc.addr, LP .loc.addr, 

TP .ptn. next 
jmpf dec TP . ptn . high, L_02 
sub TP .ptn. wide, TP .chr. wide, 2 
add vO,Pos .x,TP .chr .pitch 
add vl,Pos.y,0 

If TP.ptn.wide becomes negative, the current scan line 
is complete. The current pixel location is adjusted to the 
first pixel of the next scan line. TP.ptn.high is tested to 
determine if more scan lines are necessary. If not, the 
routine returns the address of the next character and ex- 
its. If it Is necessary to process more scan lines, 
TP.ptn.wide is renewed, and the code jumps to L_02. 

At label L_02, TP.ptn.shift_count is decremented and 
tested. If the current pattern word is not exhausted, it is 
left-shifted and tested at L_03, as described above. If 
the current pattern word is exhausted, the code contin- 
ues at label L_01, where the next pattern word is 
fetched. 

P_T1_02.S 

Routine P_T1__02.S draws the Indicated character into a 
color bit map at the specified location, with clipping. The 
routine begins with the normal global functions. 

This routine is almost exactly the same as P_T1__01 .S 
described above, exceptfor clipping. The clipping is per- 
formed in the same way as described for P_L1_02.S. 
Just prior to writing a pixel, the routine asserts that the 
pixel is inside the clipping window. If the pixel is not In- 
side the window, it is not drawn. If no subsequent pixels 
could be inside the window, the routine terminates. 

Nine parameters are loaded from the structure 
G29K Params. 



GP.wnd.min_x 
GP . wnd . max_x 
GP .wnd.min_y 
GP . wnd . max_y 
GP .pxl. value 
GP .mem. width 
GP .mem. depth 
GP .wnd. base 
GP. wnd. align 



lr2 
lr3 
lr4 
IrS 
lr6 
lr7 
lr8 
lr9 
IrlO 



const TempO,_G2 9K_Params 
consth TempO,_G2 9K_Params 
mtsrim cr, (9-1) 
loadm 0, 0,GP .wnd.min_x,TempO 

The first word of the character form is loaded into TempO 
and divided into its five components. The inset and as- 
cent are offset by 32 and must be adjusted. 

load 0,0, Tempo, Form 

srl TP. chr. high, TempO, 2 6 
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srl 


TP, 


.Chr 


and 


TP, 


.chr 


srl 


TP, 


.chr 


and 


TP 


.chr 


sub 


TP 


.chr 


srl 


TP, 


.chr 


and 


TP 


.chr 


sub 


TP 


.chr 


and 


TP 


.chr 



wide. Tempo, 20 
wide, TP. chr. wide, 63 
inset, Tempo, 14 
inset, TP. chr. inset, 63 
, inset , TP . chr . inset , 32 
ascent , Tempo , 7 
, ascent, TP. chr. ascent, 127 
, ascent , TP . chr . ascent , 32 
pitch, Tempo, 127 



Routine S__IWI1_01 is called to convert the destination 
coordinates to a linear address and alignment. The inset 
is added to the x position and the ascent is added to the y 
position. 

add LP .loc.x, Pes .x,TP. chr .inset 
call ret,S_Ml_01 

add LP. loc.y, Pes .y,TP. chr. ascent 
sll TP .ptn. next, TP .chr .wide, 

GP .mem. depth 
sub TP .ptn. next, GP. mem. width, 

TP .ptn. next 
const TP .ptn.shift_set, 

(0x80000000 + 30) 
consth TP . ptn . shif t_set , 
(0x80000000 + 30) 
sub TP. ptn. high, TP .chr .high, 2 
sub TP. ptn. wide, TP .chr .wide, 2 
const LP .clp.skip_vec, L_04 
consth LP .clp.skip_vec, L_04 
const LP .clp.stop_vec,L_05 
consth LP .clp.stop_vec, L_05 

Variable TP.ptn.nextls set to the value necessary to in- 
crement from the last pixel of a scan line in the character 
cell, to the first pixel of the next scan line. Variable 
TP.ptn.shift_set\s initialized to count bits In the pattern 
words. It is loaded into TP.ptn.shift_count each time a 
new pattern word is fetched. The Initial value is negative. 
TP.ptn.high and TP.ptn.wide are set to TP.chr.high-2 
and TP.chr.wide-2, respectively. 

The character-drawing loop begins at label L_01 . The 
pointer in Form is moved to the first pattern word and 
the word is loaded into TP.ptn.mask. The variable 
TP.ptn.shiftjcount is renewed from TP.ptn.shift_set, 
and the code jumps to L_03. 



_01: 
add 
load 
jmp 
add 



Form, Form, 4 

0, 0,TP .ptn. mask, Form 

L_03 

TP . ptn . shif t_count , 

TP . ptn . shif t_set , 



At label L_03, the high-order bit of the current mask 
word Is tested. If it is a 0, the code jumps to L_04. If it is 
necessary to write t)oth a foreground and background 
cotor, one would add code to write a background color 
instead of simply jumping to L_04. 

L_03: 

jmpf TP.ptn.mask,L_04 

nop 

asle V_CLIP_SKIP , LP . loc . y , 

GP.wnd.max_y 
asge V_CLIP_SKIP, LP .loc.x, 

GP .wnd.min_x 
asle V_CLIP_SKIP , LP . loc . x, 

GP .wnd.max_x 
asge V_CLIP_STOP , LP . loc . y , 

GP .wnd.min_y 
store 0, 0,GP .pxl .value, LP . loc.addr 

If the bit in the mask register is a 1 , the current pixel color 
is written into the current pixel location, if it is within the 
clipping window. If any of the first three asserts fall, the 
pixel is outside the window, but there is a possibility that 
further pixels in the character cell may still be in the win- 
dow. In this case, the store is skipped. If the fourth assert 
falls, the pixel Is below the window, which means no 
more pixels can possibly be in the window. In this case, 
the routine terminates. 

At label L_04, variable TP.ptn.wide \s decremented and 
tested. If it is not negative, the destination scan line need 
not change. The pixel location is incremented to the next 
pixel, and the code returns to label L_02, where it tests 
for bits remaining In the current mask word. 

L_04: 

add LP . loc .X, LP .loc .x, 1 

jmpf dec TP .ptn.wide,L_02 

add LP . loc . addr , LP . loc . addr , 

PIXEL_SIZE 
add LP .loc.x, Pos .x,TP .chr .inset 
sub LP .loc.y, LP .loc.y, 1 
add LP . loc . addr , LP . loc . addr , 

TP .ptn. next 
jmpf dec TP .ptn.high,L_02 
sub TP. ptn. wide, TP. chr. wide, 2 



_05: 
add 
add 



vO, Pos. x,TP. chr. pitch 
vl,Pos .y, 



L_02: 

jmpf dec TP . ptn . shif t_count , L_0 1 
sll TP .ptn. mask, TP.ptn.mask, 1 



When TP.ptn.wide becomes negative, the current scan 
line is complete. The current pixel location is adjusted to 
the first pixel of the next scan line. TP.ptn.higfi is tested 
to determine if more scan lines are necessary. If not, the 
routine returns the address of the next character and 
then exits. 

If it is necessary to process more scan lines, 
TP.ptn.wide is renewed, and the code jumps to L_02. 
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At label L_02, TP.ptn.shift_count Is decremented and 
tested. If the current pattern word is not exhausted, it Is 
left-shifted and tested at L_03, as described above. If 
the current pattern word is exhausted, the code contin- 
ues at label L_01, where the next pattern word is 
fetched. 

Filled-Triangle Routines 

There are eight functions for filled triangles. They are 
shown in Table 1 1 . 

Tabie 11. Routines For Fiiied Triangies 

Routine Function 

P_S1_01 .S Shaded triangle, no clipping 
P_S1_02.S Shaded triangle, with clipping 
P_F1_01 .S Solid filled triangle, no clipping 
P_F1_02.S Solid filled triangle, with clipping 
P_F2_01 .S General filled triangle, no clipping 
P_F2_02.S General filled triangle, with clipping 
P_F3_01 .S Monochrome filled triangle, no clipping 
P_F4_01.S General monochrome triangle, no clipping 

Shaded Triangles 

All shaded-triangle routines begin with similar declara- 
tions. The function name is declared to be global, the 
ENTER macro is used to specify that the appropriate 
general registers are required, and the routine name ap- 
pears as a label. 

.global__P_Sl__01 
ENTER SHADE_PRIMITIVE 
_P_S1_01: 

The nine parameter register names are declared with 
PARAM macros. These assign local-register numbers 
higher than (or above) the registers previously defined. 
These parameters are passed in registers. 



PARAM 


PI. 


.X 


PARAM 


PI, 


• y 


PARAM 


11 




PARAM 


P2, 


.X 


PARAM 


P2, 


.y 


PARAM 


12 





PARAM P3.X 
PARAM P3.y 
PARAM 13 

P(n).x, P(n).y, and l(n) together specify the x.y coordi- 
nates and the intensity of a point. The triangle must be 
specified with three points. 

The CLAII^ macro is the function prologue. If a spill op- 
eration is not necessary, this consists of five instruc- 
tions. If a spill is necessary, the standard SPILL routine 
is used, which may involve a Load/Store Multiple. 

Fiiied Triangles 

The filled-triangle routines begin with similar declara- 
tions. The function name is declared to be global, the 
ENTER macro is used to specify that the appropriate 
general registers are required, and the routine name ap- 
pears as a label. 

.global_P_Fl_01 
ENTER FILL_PRIMITIVE 
_P_F1_01: 

The six parameter register names are declared with 
PARAM macros. These assign local register-numbers 
higher than (or above) the registers previously defined. 
These parameters are passed in registers. 



PARAM 


PI, 


.X 


PARAM 


PI, 


• y 


PARAM 


P2 


.X 


PARAM 


P2 


• y 


PARAM 


P3 


.X 


PARAM 


P3 


• y 



Pn.x and Pn.y together specify the x.y coordinates of 
a vertex. The triangle must be specified with three 
vertexes. 

The CLAIM macro is the function prologue. If a spill op- 
eration is not necessary, this consists of five instruc- 
tions. If a spill is necessary, the standard SPILL routine 
is used, which may involve a Load/Store Multiple. 

Table 1 2 shows the input parameters that must be pre- 
sent before each routine can be called. 
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Table 12. Local-Register Usage In Filled-Triangle Routines 


Variable 


Reg 


SI^OI 


S1_02 


FIJI 


F1_02 


F2_01 


F2_02 


F3J1 


F4J1 






















GP.wnd.min_x 


Ir2 




X 




X 




X 






GP.wnd.max_x 


Ir3 




X 




X 




X 






GP.wnd.min_y 


Ir4 




X 




X 




X 






GP.wnd.max_y 


Ir5 




X 




X 




X 






GP.pxI.value 


Ire 


X 


X 


X 


X 


X 


X 


X 


X 


GP.mem.width 


Ir7 


X 


X 


X 


X 


X 


X 


X 


X 


GP.mem.depth 


Ir8 


X 


X 


X 


X 


X 


X 


X 


X 


GP.wnd.base 


Ir9 


X 


X 


X 


X 


X 


X 


X 




GP.wnd.align 


ino 


X 


X 


X 


X 


X 


X 


X 




GP.pxl.op_vec 


Ir11 


X 


X 






X 


X 






GP.pxl.ln_mask 


in 2 


X 


X 






X 


X 






GP.pxl.do_mask 


in 3 


X 


X 






X 


X 






GP.pxl.do_value 


in 4 


X 


X 






X 


X 






GP.pxl.out_mask 


ins 


X 


X 






X 


X 







BENCHMARKS 

The purpose of this section is to present some bench- 
marks for the Am29000 as a graphics processor. Ren- 
dering times are presented for a number of fundamental 
operations, including line drawing (vectors), BITBLT, 
strings, and triangle fill. The benchmarks were obtained 
by running the programs on the Architectural Simulator. 

Each benchmarking program performs the following 
basic operations: 

1 . Initialize the bit map if necessary. 

2. Read cycle counter. 

3. Call a null function n times. 

4. Read cycle counter. Calculate overhead. 

5. Call the object function n times. 

6. Read cycle counter. Calculate actual. 

7. Print results (overhead, actual, actual minus over- 
head). 

The bit map is initialized if its contents affect the execu- 
tion time, as Is the case for some arithmetic operations. 

Mem = (unsigned long *)BitMap; 
Count = 65536; 
while ( Count — ) 
*Mem++ = OL; 

In step 2, the Am29000 built-in cycle counter is read. 
This counter increments continuously, once every ma- 
chine cycle, whenever the Am29000 is running. 

Time = _cycles () ; 



A null function is called the same number of times that 
the actual function is called. 

Items = 0; 

for (ENDX = 0, EndY = 10; 

EndX < 10; ++EndX) 
{ 

T_Empty (0,0, EndX, EndY) ; 
++ Items; 



The cycle counter is read to determine the number of cy- 
cles that are spent calling the null function. Then, a new 
base time is obtained. 

Over = _cycles () - Time; 
Time = _cycles () ; 

The object function is then called n times. 



for (ENDX = 0, EndY = 10; 

EndX < 10; ++EndX) 
{ 

/*draw a line*/ 
P_L1_01 (0,0, EndX, EndY) ; 
++ Items; 
} 

The cycle counter Is read again, and the actual time is 
determined. 



Time 



_cycles - Time; 



Finally, a report is printed (see listing 1 ). An actual print- 
out looks like Listing 2. 
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Listing 1. Benchmaric Program Listing 



o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 

n 



Avrg = ( (Time - Over) + Items / 2) / Items; /* round up*/ 

printf ("Time_001: %6u cycles/vector ", Avrg) ; 

printf ("[ %5u vectors : %-20s ]\n". Items, "10 pixels each") ; 

printf (" %10u (actual) - %10u (overhead) = %10u cycles \n" 

Time, Over, Time - Over) ; 

printf (" — (10-pixel vectors with P_L1_01, direct, ") / 

printf ("undipped ) — \n") ; 

printf (" \n") ; 



O 
O 

o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 

n 



Listing 2. Benchmari( Printout 



O 
O 
O 
O 
O 
O 
O 
O 
O 
O 
O 

o 
o 



Time_001: 125 cycles/vectors [80 vectors : 10 pixels each] 

338972 (actual) 328952 (overhead) = 10020 cycles 
— (10-pixel vectors with P_L1_01, direct, undipped) 



O 
O 

o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 



Hardware Models 

Three hardware models are benchmarked. In each 
case, the cycle time is 40 ns. 

The first model is the Personal Computer Execution 
Board (PCEB29K), with limited burst-rTX)de capability 
and Branch Target Cache disabled. 

The second mode! is a typical mid-range system, with 
two-cycle first access and single-cycle burst. Such a 
system could be implemented using an instruction 
cache, or with an interleaved static memory, as de- 
scribed in the Am29000 Memory Design Handbook. 

The third model is a very fast, single-cycle system. 

The memory parameters for each model are given in 
Table 13. 

Table 13. iVIemory Parameters (Wait States) 
for Hardware Models 



Model 



PCEB Mid-Range Fast 



l-Fetch (First) 


5 


2 


1 


l-Fetch (Burst) 


1 


1 


(n/a) 


D-Fetch (First) 


4 


4 


1 


D-Fetch(Burst) 


(n/a) 


1 


(n/a) 



Benchmark Results 

Benchmark results are presented in Figures 13 through 
27 and are discussed in the subsections below. 

Vectors 

A total of 1 1 numbers are reported for each of the three 
models. The metric is made up of 10-pixel, randomly- 
oriented vectors per second. Both ends are specified for 
each vector. All numbers are for 32-bit pixels, except 
CIO which is monochrome. The numbers include setup 
and actual pixel-drawing time. A graphic representation 
is used. 

Figure 1 3 shows the drawing performance in vectors per 
second for single-width vectors. Figure 14 shows the 
performance for wide and anti-aliased lines. 
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Vectors/Sec 




C1 C2 C3 C4 C5 C10 C11 



Case 


PCEB29K 


Medium 


Fast 


C1 


87,108 


176.056 


192,308 (Vectors/Sec) 


C2 


65,963 


110,619 


117.925 


C3 


41,254 


77.882 


91,575 


C4 


47,259 


102.459 


127,551 


C5 


37,879 


72,674 


84,746 


C10 


45,372 


101,215 


119,617 


C11 


100,806 


250.000 


268,817 


Case 


Routine 


Function 




C1 


P_L1_01 


Undipped, set 




02 


P_L1_02 


Clipped, set 




C3 


P_L2_01 


Undipped, XOR (restricted) 


04 


P_L2_01 


Undipped, XOR (unrestricted) 


05 


P_L2_02 


Clipped, XOR (unrestricted) 


O10 


P_L4_01 


Undipped, monochrome, set 


011 


P_L5_01 


Undipped, fixed width, no window, set 
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Figure 13. Benchmarli Resuits for Singie-Width Line Functions (Vectors/Sec) 
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C6 



C7 



C8 



C9 



Case 

C6 
C7 
C8 
C9 



PCEB29K 

12,880 
16,689 
13,959 
14,393 



ly/ledium 

19.577 
28,027 
24,062 
22.810 



Fast 

21,386 
29,274 
25,304 
23,607 



(Vectors/Sec) 



Case Routine Function 

06 P_L3_01 Undipped, anti-aliased, max, width = 1 

07 P_L3_01 Undipped, anti-aliased, set. width = 1 

08 P_L3_01 Undipped, anti-aliased, set, width = 2 

09 P_L3_02 Clipped, anti-aliased, set, width = 1 

Figure 14. Benchmarl( Resuits for Wide/AA Line Functions (Vectors/Sec) 
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BITBLT 



A total of 1 6 numbers are reported for each of the three 
models. There are four variables, each with two cases. 
The variables are: 



The benchmark performance for BITBLT is summarized 
in Figures 15 through 22. 



Block Size 


16x16 


256 X 256 


Bits/Pixel 


1 


32 


Clipping 


Off 


On 


Operation 


Copy 


XOR/Add with Saturation 



Blocks/Sec 




C34 



C35 



C36 



C37 



Case 


PCEB 


l\/iedium 


Fast 


C34 


14,775 


25,934 


32,216 (Blocks/Sec) 


C35 


14,269 


25.407 


31 .807 


C36 


11.579 


23.020 


26,767 


C37 


11.251 


22.748 


26,288 


Case 


Routine 


Function 




C34 


P_B3_01 


Copy, undipped 




C35 


P_B3_02 


Copy, clipped 




C36 


P_B4_01 


XOR, undipped 




C37 


P_B4_02 


XOR, clipped 
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Figure 15. Benchmaric Results for BITBLT 16x16 Monochrome Functions (Blocks/Sec) 
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Bits/Sec 



9,000.000 
8.000,000 
7.000,000 
6,000,000 
5,000.000 
4,000,000 
3,000,000 
2,000.000 
1,000,000 





C34 



C35 



C36 



C37 



Case 


PCEB 


Medium 


Fast 


C34 


3,782,506 


6,639.004 


8,247.423 


C35 


3,652.968 


6,504.065 


8.142.494 


C36 


2,964.335 


5,893,186 


6,852,248 


C37 


2,880.288 


5,823.476 


6,729,758 


Case 


Routine 


Function 




034 


P_B3_01 


Oopy. undipped 


035 


P_B3_02 


Oopy. clipped 




036 


P_B4_01 


XOR. undipped 


037 


P_B4_02 


XOR, clipped 





(Bits/Sec) 
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Figure 16. Benchmarit Resuits for BITBLT 16x16 l\yionochrome Functions (Bits/Sec) 
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Blocks/Sec 



30,000 
25,000 
20,000 
15.000 
10,000 
5,000 








C30 



C31 



C32 



C33 



Case 

C30 
C31 
C32 
C33 



PCEB29K 

7,485 
7,403 
3,976 
3,922 



Medium 

25,253 

24,534 

9,913 

9,827 



Fast 

28,703 
27,964 
10,684 
10.566 



(Blocks/Sec) 



Case Routine Function 

C30 P_B1_01 Copy, undipped 

C31 P_B1_02 Copy, clipped 

C32 P_B2_01 Add w/saturation, undipped 

C33 P_B2_02 Add w/saturatlon, clipped 

Figure 17. Benchmarit Results for BITBLT 16x16 Coior Functions (Blocl(S/Sec) 
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Bits/Sec 



250,000,000 
200,000,000 
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P::iP:i| 

mm mm mm ^ 



PCEB29K 

Medium 

Fast 



C30 C31 



C32 C33 



Case 

C30 
C31 
C32 
C33 

Case 

C30 
031 
032 
033 



PCEB 

61,317,365 
60,645,543 
32,575,155 
32,125,490 

Routine 

P_B1_01 
P_B1_02 
P_B2_01 
P B2 02 



Medium 

206,868,687 

200,981,354 

81,205,393 

80,503,145 

Function 



Fast 

235,132,032 

229,082,774 

87,521.368 

86,559,594 



Oopy, undipped 

Oopy, clipped 

Add w/saturation, undipped 

Add w/saturation, clipped 



(Bits/Sec) 
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Figure 18. Benclimarl( Resuits for BITBLT 16x16 Coior Functions (Bits/Sec) 
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Case 

042 
043 
044 
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locks/Sec) 




042 

»CEB29K 

558 
558 
392 
378 

Routine 

P^B3_01 
P_B3_02 
P_B4_01 
P_B4_02 


043 044 

Medium 

1,331 

1,331 

1,035 

972 

Function 

Oopy, undipped 
Oopy, clipped 
XOR, undipped 
XOR, clipped 


045 

Fast 

1,618 (B 
1,617 
1,157 
1,068 
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Figure 19. Benchmark Resuits for BiTBLT 256x256 Monochrome Functions (Biocks/Sec) 
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C43 



C44 C45 



Case 


PCEB29K 


IMedium 


Fast 


C42 


36,595,117 


87,251,038 


106,059,037 


C43 


36,553,478 


87,237,101 


105,976,714 


C44 


25,666,975 


67,839,841 


75,802,720 


C45 


24,775,442 


63,701,400 


70,002,136 


Case 


Routine 


Function 




042 


P_B3_01 


Copy, undipped 


043 


P_B3_02 


Oopy, clipped 




044 


P_B4_01 


XOR, undipped 


045 


P_B4_02 


XOR, dipped 





(Bits/Sec) 
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Figure 20. Benchmarit Resuits for BITBLT 256x256 i\/lonochrome Functions (Bits/Sec) 
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PCEB29K 


Medium 


Fast 


C38 


35 


157 
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C39 


35 


157 


170 


C40 


18 


48 


50 


C41 


18 


48 


50 


Case 


Routine 


Function 




C38 


P_B1_01 


Oopy, undipped 




039 


P_B1_02 


Oopy, clipped 




C40 


P_B2_01 


Add w/saturation 


undipped 


041 


P_ 


B2_02 


Add w/saturation, clipped 



P0EB29K 

Medium 

Fast 
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Figure 21. Benchmaric Results for BiTBLT 256x256 Color Functions (Bloclcs/Sec) 
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Bits/Sec 



400,000,000 
350,000,000 
300,000,000 
250,000,000 
200,000,000 
150,000,000 
100,000,000 
50,000,000 






038 


039 O40 


041 


Case 


PCEB29K 


i^edium 


Fast 


C38 


73,920,216 


328,312,001 


355,741,320 


C39 


73,914,380 


328,252,390 


355,685,812 


C40 


37,791,752 


101,590.254 


105,362,074 


C41 


37,794,776 


101.587,695 


105,357,416 


Case 


Routine 


Function 




038 


P_B1_01 


Oopy, undipped 


039 


P_B1_02 


Oopy, clipped 




O40 


P_B2_01 


Add w/saturation, undipped 


041 


P_B2_02 


Add w/saturation, dipped 



(Bits/Sec) 
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Figure 22. Benchmark Results for BITBLT 256x256 Coior Functions (Bits/Sec) 
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Text 

The performance of text representation was bench- 
marked using a single case, in which each character is 
represented as a 7-by-9-pixel matrix. The results of the 
text benchmark are shown in Figure 23. 

Filled Triangles 

Two sets of benchmarks were run for filled triangles. 
The small triangles have 1 0-pixel sides, and the large tri- 



angles have 50-pixel sides. The shading is linear along 
scan lines (Gouraud shading). 

The benchmark results for filled triangles, given in 
triangles per second, are summarized in Figures 24 
through 27. 



45.000 
40,000 
35,000 
30,000 
25,000 
20,000 
15,000 
10,000 
5,000 



37,821 




42,159 


^^^ 


1 




^m 




1 



PCEB29K Medium Fast 
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Figure 23. Benchmark Results for Text Functions (Characters/Sec) 



14,000 

12,000 

10,000 

Triangles/Sec 8,000 

6,000 

4,000 

2,000 






IP 



PCEB29K 

Medium 

Fast 



C50 



C52 



C51 



C53 



Case 

C50 
C52 
C51 
C53 



PCEB29K 

6,792 

6,241 

544 

500 



l\/ledium 

12,389 

11,008 

1,248 

1,056 



Fast 

13,062 

11,457 

1.329 

1,111 



(Triangles/Sec) 



Case Routine Function 

C50 P_S1_01 .S 10 Pixel sides, shaded, undipped 

052 P_S1_02.S 10 Pixel sides, shaded, clipped 
051 P_S1_01 .S 50 Pixel sides, shaded, undipped 

053 P_S1_02.S 50 Pixel sides, shaded, clipped 

Figure 24. Benchmark Results for Shaded Triangle Functions (Triangles/Sec) 
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C60 



C62 



C61 



C63 



Case 

C60 
C62 
C61 
C63 



PCEB29K 

15,924 

12,697 

1,530 

1.261 



IVIedium 

28,802 

21 ,949 

3,893 

2,655 



Fast 

31,847 

22,222 

4,773 

2,660 



(Triangles/Sec) 



Case Routine Function 

C60 P_F1_01 .S 10 Pixel sides, solid direct, undipped 

062 P_F1_02.S 10 Pixel sides, solid direct, clipped 
061 P_F1_01 .S 50 Pixel sides, solid direct, undipped 

063 P_F1_02.S 50 Pixel sides, solid direct, dipped 
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Figure 25. Benchmaric Resuits for Solid Direct Triangie Functions (Triangies/Sec) 
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25.000 



20.000 



Triangles/Sec 15,000 



10,000 



5.000 




PCEB29K 

Medium 

Fast 



C64 C66 C65 C67 



Case 


PCEB29K 


i\Aedium 


Fast 


C64 


8.821 


17.705 


21 ,758 (Triangles/Sec) 


C66 


7.879 


14,828 


17.385 


C65 


622 


1,504 


2.074 


C67 


588 


1.273 


1,656 


Case 


Routine 


Function 




064 


P_F2_01.S 


10 Pixel sides, XOR, undipped 


066 


P_F2_02.S 


10 Pixel sides, XOR, clipped 


065 


P_F2_01.S 


50 Pixel sides. 


XOR, undipped 


067 


P_F2_02.S 


50 Pixel sides, XOR, clipped 
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Figure 26. Benchmark Resuits for Soiid XOR Triangie Functions (Triangies/Sec) 
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Triangles/Sec 




C68 



C70 



C69 



C71 



Case 

C68 
C70 
C69 
C71 



PCEB29K 

15,366 

12,927 

3,912 

3,106 



l\/ledium 

27,533 

24,851 

7,583 

6,631 



Fast 

29.656 

27,203 

8.300 

7,372 



(Triangles/Sec) 



Case Routine Function 

C68 P_F3_01 .S 1 Pixel Sides. Monochrome, undipped 

070 P_F4_01 .S 1 Pixel Sides, Monochrome XOR, undipped 
069 P_F3_01.S 50 Pixel sides. Monochrome, undipped 

071 P_F4_01 .S 50 Pixel sides. Monochrome XOR, undipped 

Figure 27. Benchmarli Results for IVIonochrome Triangie Functions (Triangles/Sec) 
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Summary 

Table 14 summarizes all preceding benchmark results. 

Tabie 14. Am29000 Graphics Performance 







Medium 


Maximum 




Primitive 


Case 


Performance 


Performance 


Units 


10-Plxel Vectors: 










Monochrome SET 


CIO 


101,215 


119,617 


Vectors/Sec 


2-32 Bits/Pixel SET 


C11 


250.000 


268.817 


Vectors/Sec 


2-32 Bits/Pixel XOR 


C5 


72.674 


84,746 


Vectors/Sec 


2-32 Bits/Pixel (AA) 


C7 


28,027 


29.274 


Vectors/Sec 


16x16 BITBLTs: 










Monochrome Copy 


C34 


25,934 


32.216 


Blocks/Sec 


2-32 Bits/Pixel Copy 


C30 


25,253 


28,703 


Blocks/Sec 


256x256 BITBLTs: 










Monochrome Copy 


C42 


1331 


1618 


Blocks/Sec 


2-32 Bits/Pixel Copy 


C38 


157 


170 


Blocks/Sec 


Text(7X9): 










2-32 Bits/Pixel 




37,821 


42,159 


Characters/Sec 


Filled Triangles (1 0-Pixel Sides): 










Shaded 2-32 Bits/Pixel 


C50 


12,389 


13,062 


Triangles/Sec 


Solid 2-32 Bits/Pixel 


ceo 


28,802 


31,847 


Triangles/Sec 


Monochrome 


C68 


27.533 


29.656 


Triangles/Sec 



ADDITIONAL PERFORMANCE 
CONSIDERATIONS 

Pipelines 

The performance of graphics-processing systems can 
be significantly increased through the use of pipe- 
lining. This is because graphics-processing operations 
can easily be partitioned into sequential tasks, such as 
transformations, end-point determination, and render- 
ing. Since these tasks depend on the results of one an- 
other in only a single direction, each can be performed 
on a particular machine and the results passed on to the 
next. 

Figure 28 shows three Am29000s pipelined together to 
execute fast line drawing. 



Scaled Arithmetic 

In a system without a floating-point processor, scaled 
arithmetic can be used to avoid floating-point emulation. 



The 3-D rendering examples in this handbook use 
scaled arithmetic. The scaled operations consist of sev- 
eral integer operations on real numbers, having an inte- 
ger and a fractional part. 

The software in this handbook assumes the presence of 
a radix point between bits 1 6 and 1 5 of words. Numbers 
therefore comprise 1 sign bit, 1 5 bits of integer part, and 
16 bits of fractional part. This is shown in Figure 29. 

With addition, subtraction, and compare, the values are 
simply ordered lists of bits. The assumed radix point 
introduces no special complication. Multiplication is 
slightly more complex. The standard multiply assumes 
two 32-bit signed numbers and produces a 64-bit signed 
number. Since there are a total of 32 bits of fraction, the 
radix point must lie between bits 31 and 32 of the prod- 
uct. After the multiplication has taken place, the middle 
32 bits are extracted, leaving the radix point between 
bits 16 and 15. It is often unnecessary to execute a full 
32-bit multiplication; the same result can be obtained by 
using a combination of shifts and adds when one oper- 
and Is a constant. 
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Figure 28. Pipelining 
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Figure 29. Scaled Aritlimetic 



Hardware Assist 

There are always tradeoffs in determining the division of 
tasks between hardware and software. This handbook 
has assumed that the hardware is relatively simple. This 
section discusses moving some activities into hard- 
ware. Though this will not result In faster primitives, It 
may result In the speed-up of some ancillary operations. 

For purposes of discussion, a typical bit map is defined 
as a 2K-wlde by IK-high buffer based on 256K x 4 
VRAMs and an 81C458 Color Palette (see Figure 30). 
This bit map and seriallzer/palette is suitable for a 
1280 X 1024 60 Hz, non-interlaced display. 

A 2K X 1K X 8 bit map based on 256K x 4 VRAMs re- 
quires 1 6 devices. The example bit map is wired so that 
each pixel is contained in two adjacent devices. It is pos- 
sible to wire the bit map differently, but this method has 
the advantage of not requiring the use of masked writes, 
which In turn has the substantial advantage of not re- 
quiring additional buffers on the data bus to inject the 
write mask, and also makes the timing generator slightly 
less complex. 



Eight adjacent VRAM pairs contain eight horizontally 
adjacent pixels. Each VRAM pair contains every eighth 
pixel. 

VRAI\/ls Used for Bit i\/laps 

The memory bandwidth required to refresh the screen 
has long been a problem with bit maps. For a screen of 
only modest resolution, the bandwidth can be hundreds 
of millions of bits per second. This can Interfere signifi- 
cantly with the bit-map update process. 

The VRAM can be thought of as a dual-port memory 
with one random port and one very specialized port. The 
random port Is a standard dynamic RAM port with *RAS, 
*CAS, addresses, and *WE. In addition, *DT/OE is used 
to control transfers to and from an internal serlalizer. 
The second port can be accessed only serially. Once 
started, up to 256 or 512 nibbles per VRAM can be 
transferred Independently of the standard port. Since 
VRAMs are typically arranged In parallel, only a single 
transfer cycle is required per scan line. 
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Figure 30. Frame Buffer 



Because the raster traverses the screen in a very regu- 
lar manner, accesses to the bit map required for screen 
refresh are very regular. In fact, for a typical system, all 
the bits necessary to refresh an entire scan line share a 
common row address. 

In a bit-map application, the standard port is controlled 
by the graphics processor. It is used for bit-map update, 
dynamic memory refresh, and transfer cycles. The se- 
rial port is used for screen refresh. 

A typical VRAM configuration is shown in Figure 31 . 

Bit-I^ap Ciearing 

In many applications, the bit map must be cleared occa- 
sionally. This could be as seldom as once every few 
minutes for a CAD system, to as often as 30 or 60 times 
a second for a real-time animation system. When the bit 
map is cleared often, this task can take a good percent- 
age of the total available time and result in an appre- 
ciable reduction in performance. In some cases, it might 
be worth a modest investment in hardware to improve 
performance. 

Note that In Figure 31 the path between the serializer 
and the dynamic memory is bidirectional. This means 
that an entire row of memory can be loaded with some 
value (typically zero) per mennory cycle. In the discus- 
sion below, we will assume the value zero generates a 
blank screen. 



Such an approach requires two things. First, it is neces- 
sary to get the pattern (Os) into the serializer. Second, it 
is necessary to build the logic to execute write transfer 
cycles. Since this operation requires the use of the 
serializer, it must be done when the serializer is other- 
wise unused for some significant period of time. Clearly 
this has to be during the vertical blanking period. 

The serializer can be set to zero by transferring a row 
from the dynamic portion of the VRAM. This would re- 
quire that one row always be either kept at zero or set to 
zero before executing the read transfer. In our example 
bit map, this is two (contiguous) scan lines. Keeping two 
scan lines at zero would imply either not displaying them 
at all, leaving 1 022 scan lines on the screen, or display- 
ing them as blanks. Using this method, the serializer can 
be set to zero in about 300 nsec. 

The serializer can also be set to zero by shifting in a row 
of Os. This would require a pseudo-write-transfer 
cycle, followed by the clocking of 512 nibbles into the 
serializer. Using this method, the serializer can be set to 
zero in about 300 + 512*50, or 26000 ns. The first 
method is certainly faster. 

Once the serializer has been cleared to zero, 51 2 write 
transfer cycles are required to actually clear the bit map. 
At 300 ns per cycle, this would require at)0ut 154 |is or 
about 1/1 of one frame time. 
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Figure 31. VRAI\A Blocic Diagram 



Saturation in Hardware 

Consider a 24-piane bit map. In that case, the three 
color values were calculated individually as fixed-point 
real numbers. The format of the numbers Is shown in 
Figure 29, in the section entitled "Scaled Arithmetic." 

If the frame store already has 8-bit latches to hold the 
values prior to being stored, one can mechanize the 
latches as 22V10 PAL devices for a small Incremental 
cost. This is shown in Figure 32. 

In the PAL equation shown in Figure 32, bit 31 is the sign 
bit. Whenever it is set, a is latched in the register, re- 
gardless of the state of any other bits. If bit 31 is not set, 
then the bit in the register is set if the calculated bit is a 1 , 
or if any of the first 4 overflow bits are set. Since each bit 
in the register has a similar set of equations, overflow 
will force a value of all Is, and underflow will force a 
value of all Os. If neither overflow nor underflow has oc- 



curred, the calculated value is placed in the register for 
storing into memory. 

Hardware Cycies 

Memory transfer and refresh cycles can be performed 
by the Am29000 in interrupt routines or in hardware. The 
advantage of software is that the memory responds only 
to the Am29000, and thus no arbiter is required. The 
disadvantage of software is that it requires some over- 
head, and transfer cycles must take place during 
the horizontal blank period; which means that the inter- 
rupt response must be guaranteed to be within less than 
5 to 8 ^s. 

The disadvantage of executing the refresh and transfer 
cycles in hardware is that the memory controller must in- 
clude additional arbitration logic, and that an additional 
cycle is required for arbitration. 
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Figure 32. Hardware Saturation 
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APPENDIX A: PROGRAM LISTINGS 
G29K.REG.H 

.eject 

.sbttl "Register Names and Trap Definitions" 

* ** 

* g29k_reg*h C 29000 Graphics Benchmarks ** 

* ** 

* Copyright 1988 Advanced Micro Devices, Inc. ** 

* Written by Gibbons and Associates, Inc. ** 

* ** 

* •* 

* Register names, trap definitions, and C function calling ** 

* convention macros. ** 

* ** 



/+ ++ 

;+ Calling Convention Registers ++ 

; + ++ 

.+ ++ 

; register stack pointer 

; trap handler argument 

; trap handler return 

; large return pointer 

; static link pointer 

; memory stack pointer 

; register spill bound 

; register fill bound 

; function frame bound 

; return address 



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
+ ++ 

+ User Registers (General Names) ++ 

+ ++ 

I i I 

I Function return registers (16) II 

I i I 
, ,1 

.reg vO, gr96 

.reg vl, gr97 

.reg v2, gr98 

.reg v3, gr99 



reg 


rsp. 


grl 


reg 


tav. 


grl21 


reg 


tpc. 


grl22 


reg 


Irp, 


grl23 


reg 


sip. 


grl24 


reg 


msp. 


grl25 


reg 


rsb. 


grl26 


reg 


rfb. 


grl27 


reg 


ffb. 


Irl 


reg 


ret. 


IrO 
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reg 


v4. 


grlOO 


reg 


v5. 


grlOl 


reg 


v6. 


grl02 


reg 


vl, 


grl03 


reg 


v8. 


grl04 


reg 


v9. 


grlOS 


reg 


vlO, 


grl06 


reg 


vll. 


grl07 


reg 


vl2. 


grl08 


reg 


vl3. 


grl09 


reg 


vl4. 


grllO 


reg 


vl5. 


grill 



Volatile temporary registers (25) 



I I 



reg 


to. 


grll6 


reg 


tl, 


grll7 


reg 


t2. 


grlie 


reg 


t3. 


grll9 


reg 


t4, 


grl20 


reg 


t5. 


gr99 


reg 


t6. 


gr98 


reg 


t7. 


gr97 


reg 


t8. 


gr96 


reg 


t9. 


grl21 


reg 


tio. 


grl22 


reg 


til, 


grl23 


reg 


tl2. 


grl24 


reg 


tl3. 


grill 


reg 


tl4. 


grllO 


reg 


tl5, 


grl09 


reg 


tl6. 


grl08 


reg 


tl7. 


grl07 


reg 


tie. 


grl06 


reg 


tl9. 


grl05 


reg 


t20. 


grl04 


reg 


t21. 


grl03 


reg 


t22. 


grl02 


reg 


t23. 


grlOl 


reg 


t24. 


grlOO 



(v3) 

(v2) 

(vl) 

(vO) 

(tav) 

(tpc) 

(Irp) 

(sip) 

(vl5) 

(vl4) 

(vl3) 

(vl2) 

(vll) 

(vlO) 

(v9) 

(v8) 

(v7) 

(v6) 

(v5) 

(v4) 



I Reserved registers (4) 



I I 
I I 
I I 



reg 


rO, 


grll2 


reg 


rl. 


grll3 


reg 


r2. 


grll4 


reg 


r3. 


grll5 
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I Parameter registers (16) I I 

I 11 
, , , 



reg 


pO, 


lr2 


reg 


pl. 


lr3 


reg 


p2. 


lr4 


reg 


p3. 


IrS 


reg 


p4. 


lr6 


reg 


p5. 


lr7 


reg 


p6. 


lr8 


reg 


p7. 


lr9 


reg 


p8. 


IrlO 


reg 


P9r 


Irll 


reg 


plO, 


lrl2 


reg 


Pllr 


lrl3 


reg 


pl2. 


lrl4 


reg 


pl3. 


lrl5 


reg 


pl4. 


lrl6 


reg 


pl5. 


lrl7 



I II 

I Global control parameter registers I | 

I I I 
, , , 

.reg GP .wnd.min_x, lr2 

.reg GP.wnd.max_x, IrS 

.reg GP .wnd.min__y, lr4 

.reg GP .wnd.max_y, lr5 

.reg GP .pxl .value, lr6 

.reg GP. mem. width, lr7 

.reg GP. mem. depth, lr8 

.reg GP.wnd.base, lr9 

.reg GP .wnd. align, IrlO 

.reg GP . pxl . op_vec , Irll 

.reg GP.pxl.in_mask, lrl2 

.reg GP .pxl.do__mask, IrlS 

.reg GP.pxl.do__value, lrl4 

.reg GP .pxl.out_mask, lrl5 

.reg GP.wid. actual, lrl6 

.reg GP .pxl.op_code, lrl7 

.reg GP. mem. base, lrl8 

.reg GP .wnd.origin_x, lrl9 

.reg GP .wnd.origin_y, lr20 
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I 

I Line parameter registers 

I 



I I 



reg 


LP 


l0C*x, 


lr21 


reg 


LP 


loc.y. 


lr22 


reg 


LP 


loc.addr. 


lr23 


reg 


LP 


loc. align. 


lr24 


reg 


LP 


wid. axial. 


lr25 


reg 


LP 


wid.side__l. 


lr26 


reg 


LP 


wid.side__2. 


lr27 


reg 


LP 


gen. cover. 


lr28 


reg 


LP 


.gen.delta__p. 


lr29 


reg 


LP 


.gen.delta_s. 


lr30 


reg 


LP 


gen.move_p. 


lr3l 


reg 


LP 


gen.move__s. 


lr32 


reg 


LP 


.gen.p, 


lj:33 


reg 


LP 


.gen.s. 


lr34 


reg 


LP 


.gen.minjp. 


lr35 


reg 


LP 


.gen.max_jp. 


lr36 


reg 


LP 


.gen.min_s. 


lr37 


reg 


LP 


.gen.max_Sr 


lr38 


reg 


LP 


.gen. slope. 


lr39 


reg 


LP 


. gen. x_s lope. 


lr40 


reg 


LP 


.gen. error. 


lr41 


reg 


LP 


.gen.x_error. 


lr42 


reg 


LP 


.gen.addr. 


lr43 


reg 


LP 


.gen.try__s. 


lr44 


reg 


LP 


.gen. count. 


lr45 


reg 


LP 


.clp.skip__vec. 


lr46 


reg 


LP 


.clp.stop__vec, 


lr47 


reg 


LP 


.clp.skipjp, 


lr48 


reg 


LP 


.clp.stop_j>. 


lr49 


reg 


LP 


.clp.skip_s. 


IrSO 


reg 


LP 


.clp.skip_s_l. 


IrSl 


reg 


LP 


.clp.stop_s_l. 


lr52 


reg 


LP 


.clp.skip_s__2. 


lr53 


reg 


LP 


.clp.stop_s_2. 


lr54 
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I I 

I Block parameter registers I 

I I 



equ 


MAX_WORDS, 32 




equ 


MAKESHIFT, 5 




reg 


BP 


.grp 


op_skip. 


lr25 


reg 


BP 


grp 


align. 


lr26 


reg 


BP 


grp 


repeat. 


lr27 


reg 


BP 


grp 


count. 


lr28 


reg 


BP 


dst 


align. 


lr29 


reg 


BP 


dst 


lft_addr. 


lr30 


reg 


BP 


.dst 


lft_mask. 


lr31 


reg 


BP 


.dst 


lft__end. 


lr32 


reg 


BP 


dst 


addr. 


lr33 


reg 


BP 


dst 


array. 


lr34 


reg 


BP 


dst 


array_00. 


lr34 


reg 


BP 


.dst 


array_01. 


lr35 


reg 


BP 


dst 


array_02. 


lr36 


reg 


BP 


.dst 


array_03. 


lr37 


reg 


BP 


.dst 


array_04. 


lr38 


reg 


BP 


dst 


array_05. 


lr39 


reg 


BP 


.dst 


array_06. 


lr40 


reg 


BP 


.dst 


array_07. 


lr41 


reg 


BP 


.dst 


array_08. 


lr42 


reg 


BP 


.dst 


array_0 9. 


lr43 


reg 


BP 


.dst 


array_10. 


lr44 


reg 


BP 


.dst 


array_ll. 


lr45 


reg 


BP 


.dst 


array_12. 


lr46 


reg 


BP 


.dst 


array_13. 


lr47 


reg 


BP 


.dst 


array_14. 


lr48 


reg 


BP 


.dst 


array_15. 


lr49 


reg 


BP 


.dst 


. array__16. 


IrSO 


reg 


BP 


.dst 


.array_17. 


IrSl 


reg 


BP 


.dst 


.array_18. 


lr52 


reg 


BP 


.dst 


.array_19. 


lr53 


reg 


BP 


.dst 


.array_20. 


lr54 


reg 


BP 


.dst 


array_21. 


lr55 


reg 


BP 


.dst 


array_22. 


lr56 


reg 


BP 


.dst 


.array_23. 


lr57 


reg 


BP 


.dst 


. array_24. 


lr58 


reg 


BP 


.dst 


.array_25. 


lr59 


reg 


BP 


.dst 


.array_26. 


lr60 


reg 


BP 


.dst 


.array_27. 


lr61 


reg 


BP 


.dst 


array_28. 


lr62 


reg 


BP 


.dst 


array_29. 


lr63 


reg 


BP 


.dst 


array_30. 


lr64 


reg 


BP 


.dst 


.array_31. 


lr65 
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.if MAX__WORDS == 32 

. reg BP .dst .array_end, 

.endif 



BP.dst .array__31 



.if MAX_WORDS ==16 

.reg HP .dst .array_end, 

.endif 



BP.dst .array_15 



.if MAX_WORDS == 8 

.reg BP .dst .array_end, 

.endif 

.if MAX_WORDS == 4 

.reg BP .dst .array_end, 

.endif 



BP.dst .array_0 7 



BP.dst.array_03 



reg 


BP. 


. src. 


. lft_addr. 


lr66 


reg 


BP, 


. src. 


.rgt^tr. 


lr67 


reg 


BP, 


.src, 


.save. 


lr68 


reg 


BP. 


.src. 


.shift. 


lr69 


reg 


BP, 


.src, 


.addr. 


lr70 


reg 


BP, 


.src, 


.extra. 


lr71 


reg 


BP, 


.src. 


.array. 


lr72 


reg 


BP, 


.src. 


.array_00. 


lr72 


reg 


BP, 


.src. 


.array_01. 


lr73 


reg 


BP, 


.src. 


. array_02. 


lr74 


reg 


BP. 


.src. 


.array_03. 


lr75 


reg 


BP, 


.src. 


. array_04. 


lr76 


reg 


BP, 


.src. 


.array_05. 


lr77 


reg 


BP. 


.src. 


.array_06. 


lr78 


reg 


BP, 


.src. 


. array_07. 


lr79 


reg 


BP. 


.src. 


. array_08. 


lr80 


reg 


BP. 


.src. 


. array_09. 


lr81 


reg 


BP, 


.src. 


. array_10. 


lr82 


reg 


BP. 


.src. 


. array_ll. 


lr83 


reg 


BP. 


.src. 


.array_12. 


lr84 


reg 


BP. 


.src. 


. array_13. 


lr85 


reg 


BP. 


.src. 


. array_14. 


lr86 


reg 


BP. 


.src. 


. array_15. 


lr87 


reg 


BP. 


.src. 


.array_16. 


lr88 


reg 


BP. 


.src. 


. array_17. 


lr89 


reg 


BP. 


.src. 


.array_18. 


lr90 


reg 


BP. 


.src. 


.array_19. 


lr91 


reg 


BP. 


.src. 


.array_20. 


lr92 


reg 


BP. 


.src. 


.array_21. 


lr93 


reg 


BP. 


.src. 


.array_22. 


lr94 


reg 


BP. 


.src. 


.array_23. 


lr95 


reg 


BP. 


.src. 


.array_24. 


lr96 


reg 


BP. 


.src. 


.array_25. 


lr97 


reg 


BP. 


.src. 


.array_26. 


lr98 


reg 


BP. 


.src. 


.array_27. 


lr99 


reg 


BP. 


.src. 


.array_28. 


IrlOO 


reg 


BP. 


.src. 


.array__29. 


IrlOl 
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. reg BP .src .array_^30, lrl02 
. reg BP .src.array__31, lrl03 

.if MAX_WORDS == 32 

.reg BP .src.array_end, BP.src.array_31 

.endif 

.if MAX_WORDS ==16 

.reg BP .src.array_end^ BP.src.array_15 

.endif 

.if MAX_WORDS == 8 

.reg BP .src.array_end, BP.src.array__07 

.endif 

.if MAX WORDS == 4 



reg 


BP 


src 


array_end. 


BP.sr 


endif 










reg 


BP 


.fst 


shift. 


lrl04 


reg 


BP 


.fst 


skip. 


IrlOS 


reg 


BP 


.fst 


. incr. 


lrl06 


reg 


BP 


.fst 


count. 


lrl07 


reg 


BP 


.dst 


.rgtjptr. 


lrl08 


reg 


BP 


.dst 


. rgt__mask. 


lrl09 


reg 


BP 


.dst 


. rgt_end. 


IrllO 



I Text parameter registers II 

I II 
I _ I 

.reg TP.chr.high, lr25 

.reg TP.chr.wide, lr26 

.reg TP .chr. inset, lr27 

.reg TP.chr .ascent, lr28 

.reg TP.chr. pitch, lr29 

.reg TP.ptn.mask, lr30 

.reg TP.ptn.shift_count, lr31 

.reg TP .ptn.shift^set, lr32 

.reg TP.ptn.high, lr33 

.reg TP.ptn.wide, lr34 

.reg TP.ptn.next, lr35 
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I 

I Shading parameter registers 



I I 
I I 



reg 


SP 


gen 


more. 


lr25 


reg 


SP 


gen 


left. 


lr26 


reg 


SP 


gen 


right. 


lr27 


reg 


LP 


gen 


cover. 


lr28 


reg 


SP 


.gen 


count. 


lr29 


reg 


SP 


.gen 


grade. 


lr30 


reg 


SP 


gen 


extra. 


lr31 


reg 


SP 


gen 


resid. 


lr32 


reg 


SP 


gen 


inc_r. 


IrSS 


reg 


SP 


gen 


dec_r. 


lr34 


reg 


SP 


.1ft 


count. 


lr35 


reg 


SP 


1ft 


move_jp. 


lr36 


reg 


SP 


.1ft 


mo ve_s , 


lr37 


reg 


SP 


1ft 


fix^. 


lr38 


reg 


SP 


1ft 


fix_s. 


lr39 


reg 


SP 


1ft 


error. 


lr40 


reg 


SP 


.1ft 


point. 


lr41 


reg 


SP 


.1ft 


grade , 


lr42 


reg 


SP 


.1ft 


extra. 


lr43 


reg 


SP 


.1ft 


resid. 


lr44 


reg 


SP 


.1ft 


inc_r. 


lr45 


reg 


LP 


.clp 


skip_vec. 


lr46 


reg 


LP 


clp 


stop__vec. 


lr47 


reg 


SP 


.1ft 


dec_r. 


lr48 


reg 


SP 


.1ft 


.shade. 


lr49 


reg 


SP 


.1ft 


flag. 


IrSO 


reg 


SP 


.1ft 


X, 


IrSl 


reg 


SP 


.1ft 


Yf 


lr52 


reg 


SP 


1ft 


ni_s_x. 


lr53 


reg 


SP 


.1ft 


m_jp__x. 


lr54 


reg 


SP 


.clp 


skip_x. 


lr55 


reg 


SP 


.clp 


skip_y. 


lr56 


reg 


SP 


.clp 


stop_x. 


lr57 


reg 


SP 


.clp 


stop_y. 


lr58 


reg 


SP 


rgt 


count. 


lr59 


reg 


SP 


.rgt 


move_jp. 


lr60 


reg 


SP 


.rgt 


mo ve_s , 


lr61 


reg 


SP 


.rgt 


fix^. 


lr62 


reg 


SP 


rgt 


fix__s. 


lr63 


reg 


SP 


rgt 


error. 


lr64 


reg 


SP 


rgt 


point. 


lr65 


reg 


SP 


rgt 


grade. 


lr66 


reg 


SP 


rgt 


extra. 


lr67 


reg 


SP 


.rgt 


resid. 


lr68 
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.reg SP .rgt .inc_r, lr69 

.reg SP .rgt .dec_r, lr70 

.reg SP .rgt .shade, lr71 

.reg SP.rgt.flag, lr72 



I I I 

I Shading parameter registers I | 

I I I 
I ,1 

.reg FP .1ft .pixel, lr33 
.reg FP .rgt .pixel, lr34 



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
+ ++ 

+ Programmed Traps ++ 

+ ++ 

+ ++ 



.equ 


V_SPILL, 


64 


; spill register stack 


.equ 


V_FILL, 


65 


; fill register stack 


.equ 


V_CLIP_SKIP, 


100 


; skip if clipped 


.equ 


V_CLIP_STOP, 


101 


; stop if clipped 



+++++-I-++++++++++4-+++++++++++++++++++++++++++++++++++++++4-+++++++++ 
+ ++ 

+ Function Prolog/Epilog Macros and Constants ++ 

+ ++ 

+ ^4. 

The following are fixed memory configuration parameters. 



.equ PIXEL_SIZE, 4 



The following constants are used as the argument to the ENTER 
macro. 



.equ LINE_PRIMITIVE, 53 

.equ BLOCK_PRIMITIVE, 109 

.equ TEXT_PRIMITIVE, 46 

.equ SHADE_PRIMITIVE, 71 

.equ FILL_PRIMITIVE, 71 
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Used at the beginning of a function that is callable from a 
C program^ immediately before the label. Only definitions of 
management symbols are made; no code is generated. 



.macro ENTER, ^FNTYPE 

.ifndef FN MANAGE 

.set ^FN_MANAGE, 

.endif 

.if ^FN_MANAGE !« 

.print "ENTER without prior LEAVE" 

.err 
.else 

.set ^FN__MANAGE, 1 

.set LP_ALLOC, ( ^FNTYPE + 3) & OxFFFFFFFE 

.set ^PX, 2 

.endif 
.endm 



-++ 



Used in a function that is callable from a C program to name 
an argument to the function, immediately after the function's 
label. Only a definition of a local register symbol is made; 
no code is generated. 



.macro PARAM, ^PNAME 

.ifndef ^FN_MANAGE 

.set ^FN__MANAGE, 

.endif 

.if ^FN_MANAGE !« 1 

.print ^'PARAM without prior ENTER, or after CLAIM" 

.err 
.else 

.reg ^PNAME, %%( LP__ALLOC + ^PX + 128) 

.set ^PX, ^PX + 1 

.endif 
.endm 



Used in a function that is callable from a C program to claim 
space in the local register stack cache, immediately after the 
last PARAM. The calling convention prolog code is generated. 
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.macro CLAIM 
.ifndef FN_MANAGE 

.set FN__MANAGEv 

.endif 

.if ^FN__MANAGE != 1 

.print "CLAIM without prior ENTER" 

.err 
.else 

.set ^FN_MANAGE, 2 

.if ( LP_ALLOC * 4) >= 256 

const tav, LP_ALLOC * 4 

sub rsp, rsp, tav 
.else 

sub rsp^ rsp, LP___ALLOC * 4 

.endif 

asgeu V_SPILL, rsp/ rsb 
.if (( LP_ALLOC + ^PX) * 4) >« 256 

const tav, ( LP_ALLOC + ^PX) * 4 

add ffb, rsp, tav 
.else 

add ffb, rsp, ( LP__ALLOC + ^PX) * 4 

endif 
endif 
endm 



Used in a function that is callable from a C program to release 
space from the local register stack cache. Part of the calling 
convention epilog code is generated. A normal instruction must 
follow this and precede the LEAVE invocation. This instruction 
may be a NOP . 



macro RELEASE 

ifndef ^FN_MANAGE 

.set ^FN_MANAGE, 

endif 

if ^FN_MANAGE != 2 

.print "RELEASE without prior CLAIM" 
.err 
.else 

.set ^FN_MANAGE, 3 

.if ( LP_ALLOC * 4) >= 256 

const tav, LP__ALLOC * 4 

add rsp, rsp, tav 
.else 

add rsp, rsp, LP_ALLOC * 4 

.endif 
.endif 
.endm 
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++ 

Used to end a function that is callable from a C program. The 
remainder of the calling convention epilog code is generated. 



.macro LEAVE 
.ifndef ^FN_MANAGE 

.set FN_MANAGE, 

.endif 

.if FN_MANAGE != 3 

.print "LEAVE without prior RELEASE" 

.err 
.else 

.set ^FN_MANAGE, 

jmpi ret 

asleu V_FILL, ffb, rfb 
.endif 
.endm 



r end of g29k_reg.h 
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GRAPH29K.H 

C 29000 Graphics Benchmarks ** 

** 

** Copyright 1988 Advanced Micro Devices, Inc. ** 

** Written by Gibbons and Associates, Inc. ** 



** graph2 9k.h 



** This file contains functions to provide queries to the ** 
** tester functions. ** 

• * ** 

#if ! defined (GRAPH2 9K) 
#define GRAPH2 9K 



++ 

++ Type Definitions 



++ 
++ 
++ 



-*/ 



typedef struct parameters 



/* for controlling graphics */ 



int 
int 
int 
int 



wnd_min_x; /* min window x-coord */ 

/* (origin relative) */ 

wnd_max_x; /* max window x-coord */ 

/* (origin relative) */ 

wnd_min_y; /* min window y-coord */ 

/* (origin relative) */ 

wnd_max_y; /* max window y-coord */ 

/* (origin relative) */ 



unsigned long pxl_value; 



/* current pixel color */ 
/* or shading value */ 



unsigned int mem_width; 
int mem_depth; 



/* no. bytes added to */ 

/* move down one */ 

/* pixels/word code: */ 

/* -1=32 0=4 1=2 2=1 */ 



unsigned char * wnd_base; 
unsigned int wnd_align; 



/* base address of */ 

/* window origin */ 

/* no. pixels added to */ 

/* get actual origin */ 



void ( * pxl_op_vec ) ();/* pointer to routine */ 

/* to do pixel-op */ 

unsigned long pxl_in_mask; /* pixel-op memory src */ 

/* input mask */ 
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unsigned long pxl_do_mask; 
unsigned long pxl_do_value; 
unsigned long pxl_out_mask; 



mt 
int 



wid_actual; 
pxl_op_code; 



unsigned char * mem_base; 

unsigned int wnd_origin_x; 

unsigned int wnd_origin_y; 

} 
parameters; 



/* pixel-op memory src */ 

/* acceptance mask */ 

/* pixel-op memory src */ 

/* acceptance value */ 

/* pixel-op memory dst */ 

/* output mask */ 

/* actual pixel width */ 

/* of line segment */ 

/* encoded value for */ 

/* current pixel-op */ 

/* base address of */ 

/* graphics raster */ 

/* x-coord of origin */ 

/* (raster relative) */ 

/* y-coord of origin */ 

/* (raster relative) */ 



typedef struct point 
{ 
int 

int 

} 
point; 



/* for drawing position*/ 

/* x-coordinate */ 

/* (origin relative) */ 

/* x-coordinate */ 

/* (origin relative) */ 



typedef struct 
{ 



/* 3D floating-point coordinates 



double 


x; 


/* 


x-coord of 3D vector 


*/ 


double 


y; 


/* 


y- " " " 


V 


double 

} 

vector; 


z; 


/* 


z- " " " " 


*/ 










struct 

{ 

vector 




/* 


vertex description 


*/ 


n; 


/* 


normal vector 


*/ 


point 


p; 


/* 


graphics coord point 


*/ 


int 

} 

vertex; 


i; 


/* 


intensity value 


*/ 











typedef struct 



/* triangle description 
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{ 



point 


pi; 


/ 


int 


il; 


/ 


point 


p2; 


/ 


int 


i2; 


/ 


point 


p3; 


/ 


int 
1 


i3; 


/ 


} 
triangle; 







/* 1st pt coordinates */ 

1st pt intensity value */ 

2nd pt coordinates */ 

/* 2nd pt intensity value */ 

3rd pt coordinates */ 

3rd pt intensity value */ 



++ ++ 
++ External Variables ++ 
++ ++ 
\* i^i 

extern parameters /* current graphics control parameters */ 
G29K Params; 



/^ 



/* set of triangle arrays for spheres */ 



extern triangle 



Tri_00[512], 
Tri_04[512], 
Tri_08[512], 
Tri_12[512], 
Tri_16[512], 
Tri_20[512], 
Tri_24[512], 
Tri 28[512], 



extern vertex /* set of vertex arrays for spheres */ 

Vrt_00[512], Vrt_01[512]; 

extern triangle * /* base of triangles list for spheres */ 

Triangles; 



Tri 01[512], 


Tri 02 [512], 


Tri_03[512], 


Tri_05[512], 


Tri_06[512], 


Tri 07 [512], 


Tri 09[512], 


Tri 10 [512], 


Tri_ll[512], 


Tri 13[512], 


Tri_14[512], 


Tri 15[512], 


Tri 17 [512], 


Tri_18[512], 


Tri_19[512], 


Tri 21[512], 


Tri_22[512], 


Tri_23[512], 


Tri 25[512], 


Tri 26[512], 


Tri 27 [512], 


Tri 29[512], 


Tri_30[512], 


Tri 31 [512]; 



/^ 



*/ 



extern unsigned char /* bit-map memory 

BitMap [ ] , 

BM_MB_LFT [ ] , BM_ML_MID [ ] , BM_ML_LLC [ ] , 

BM_CB_LFT [ ] , BM_CL_MID [ ] , BM_CL_LLC [ ] ; 



*/ 



92 



Graphics Primitives 



/*4 
++ 

++ Function Prototypes 

++ 

\* 



unsigned int 
_cycles 
(void) ; 



/* get partial cycles counter 



++ 
++ 
++ 



/* *\ 

Obtains the least 32 bits of the system cycle counter. 

Parameters : none 



Return: least significant 32 bits of cycle counter 



\* 



*/ 



/* 

void 
S_C1_01 
(void) ; 



/* set clipping trap vectors 



/*- 



Sets the vectors for the clipping traps. 



Parameters: 



Return : none 



\^ 



*\ 



*/ 



/* 

triangle * 
Sphere 



/* model a sphere 
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( 

int 
int 
int 
int 
int 
int 
int 
)/ 



L Gainmar 


/ 


L Theta, 


/ 


L Reflect, 


/ 


L Ambient, 


/ 


M Radius, 


/ 


M Rings, 


/ 


M Sects 


/ 



<- light source gamma angle 

<- light source theta angle 

<- reflection proportion */ 

<- ambient constant 

<- radius of modeled sphere 

/* <- no. of model rings */ 

<- no. of 1st ring sections 



/* 



_ •x 



Model a sphere as a set of triangles. 



Parameters : 



Return : 



pointer to last triangle in list, or NULL 



\*-- 



*/ 



tendif 



/* end of graph29k.h */ 



/ 



******•••*••*•*•*****•*********•*•*****•***•••*•***********•• 



**•/ 
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TEST_L1.C 

** ** 

** test_ll.c C 29000 Graphics Benchmarks ** 

** ** 

** Copyright 1988 Advanced Micro Devices, Inc. ** 

** Written by Gibbons and Associates, Inc. ** 



** These are the test functions for the LI primitives. ** 



#include 
# include 
#include 
# include 
# include 



<stdio.h> 
<stdlib.h> 
"graph29k.h" 
"p_ll.h" 
"dumpmap . h" 



** ** 

** Definitions ** 

*• ** 

++ ++ 

++ Global Functions ++ 

++ ++ 

\* */ 



void 

T LI 01 01 



/* single line with P_L1_01 



V 



( 

unsigned int 

FILE * 

) 



Overhead, 
Report 



/* <- timing overhead */ 
/* <- report output file */ 



/* *\ 

Runs a simple test of "P_L1_01" with a single, fixed line. 



Parameters: 
Overhead 



overhead associated with the timing 
measurement; should be subtracted 
from the time for each repetition 
of the function being timed. 
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Report 



stream pointer for the file to which 
reports are to be written. 



Return : 



\* 



*/ 



{ 

unsigned char * Base; 
unsigned int Time; 
unsigned int Count; 



/* base of graphics memory */ 
/* function time counter */ 
/* memory clear counter */ 



fprintf (Report, "Ll__01.01 : ") , 



** The following parameters are initialized for use 

** by P_L1_01. 

*/ 



G29K__Params.pxl_value = OxFFFFFFFFL; 
G29K_Params.mem_width = 256 * 4; 
G29K_Params.mem_depth = 2; 
G29K_Params.wnd_base = BM__CL_LLC; 
G29K__Params.wnd_align = 0; 



/* 32 planes 



*/ 



** The following parameters are initialized but are 
** -N-O-T- used by P_L1_01. 



G29K_Params.wnd_min_x = 0; 
G2 9K_Params.wnd_max_x = 255; 
G29K_Params.wnd_min__y = 0; 
G2 9K_Params.wnd_max_y = 255; 
G29K_Params.pxl_op_vec = NULL; 
G29K_Params.pxl_in_mask = OxFFFFFFFFL; 
G29K_Params.pxl_do_mask = 0x000 0000 OL; 
G29K_Params.pxl_do_value = OxOOOOOOOOL; 
G29K_Params.pxl_out_mask = OxFFFFFFFFL; 
G29K_Params.wid_actual = 1; 
G2 9K_Params . pxl_op_code = ; 
G29K_Params.mem_base = BitMap; 
G29K_Params .wnd_origin_x = 0; 
G2 9K_Params.wnd_origin_y = 255; 



/* 



The bit-map memory is cleared. 



Base = BitMap; 
Count = 256 * 256 
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while ( Count — ) 

*Base++ = 0; 
/* 

** The vector is drawn and the timing measurement is 
** taken. 
*/ 

Time = ^cycles (); 

P_L1_01 (20, 20, 65, 54); 

Time = (_cycles () - Time) - Overhead; 

/* 

** The time measurement is reported and the bit-map 

** is compressed and dumped to a file. 

*/ 

f print f (Report, "%u cycles \n". Time) ; 

DumpMap (BitMap, 256, 256, 2, 256 * 4, "DT_L1_01 . 01") ; 

return; 



void 

T LI 01 02 



/* all 10-pixel lines with P_L1_01 



*/ 



( 

unsigned int Overhead, 
FILE * Report 
) 

/* 



/* <- timing overhead */ 
/* <- report output file */ 



Runs a test of "P_L1_01" using all possible segments that 
are ten pixels long. These are drawn in six concentric 
rings around the center of the bit-map. 



Parameters : 
Overhead 

Report 
Return : n 



overhead associated with the timing 
measurement; should be subtracted 
from the time for each repetition 
of the function being timed. 

stream pointer for the file to which 
reports are to be written. 



\*- 
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{ 



unsigned 


char 


* Base; 


unsigned 


int 


Time; 


unsigned 


int 


Times 


unsigned 


int 


Lines 


unsigned 


int 


BgnX; 


unsigned 


int 


BgnY; 


unsigned 


int 


EndX; 


unsigned 


int 


EndY; 


int 




OffX; 


int 




OffY; 


unsigned 


int 


Count 



/* base of graphics memory */ 

/* function time counter */ 

/* sum of individual times */ 

/* no. of lines drawn */ 

/* x-coord of begin point */ 

/* y-coord " " " */ 

/* x-coord of end point */ 

/* y-coord " " " */ 

/* x-coord of offset */ 

/* y-coord " " */ 

/* memory clear counter */ 



fprintf (Report, "Ll_01.02 : ") , 



h 



** The following parameters are initialized for use by P_L1_01. 



G2 9K_Params .pxl_value 

G2 9K_Params .mem_width 

G29K_Params.mem_depth = 2; 

G2 9K_Params.wnd_base = BM_CL_MID 

G29K_Params .wnd_align = 0; 



OxFFFFFFFFL; 
256 * 4; 

/* 32 planes 






The following parameters are initialized but are 
-N-O-T- used by P_L1_01. 



G2 9K_Params 


.wnd_min_x = -128; 


G2 9K_Params 


.wnd_max_x = 127; 


G2 9K_Params 


.wnd_min_y = -128; 


G29K_Params 


.wnd_max_y = 127; 


G29K_Params 


.pxl_op_vec = NULL; 


G2 9K_Params 


.pxl_in_mask: = OxFFFFFFFFL; 


G2 9K_Params 


.pxl_do_mask = OxOOOOOOOOL; 


G2 9K_Params 


.pxl_do_value = OxOOOOOOOOL 


G29K_Params 


.pxl_out_mask = OxFFFFFFFFL 


G2 9K_Params 


.wid_actual = 1; 


G2 9K_Params 


.pxl_op_code = 0; 


G29K_Params 


.mem_base = BitMap; 


G29K_Params 


.wnd_origin_x = 128; 


G29K_Params 


.wnd_origin_y = 127; 



** The bit-map memory is cleared. 



Base = BitMap; 
Count = 256 * 256 * 
while ( Count — ) 

*Base++ = 0; 



4; 
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** Each possible 10-pixel long vector is drawn once, in a clockwise direction aroun 

d 

** the center of the bit-map. Cycles are counted for each vector and accumulated. 

** The number of vectors is also counted. 

*/ 



Times = 0; Lines = 0; 

for ( OffX = , OffY = 10; OffX < 10; ++OffX ) 

{ 

BgnX = 3 * OffX; BgnY = 3 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles {) ; 

P_L1_01 (BgnX, BgnY, EndX, EndY) ; 

Times += (_cycles () - Time) - Overhead; 

++Lines; 

BgnX = 10 * OffX; BgnY = 10 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles (); 

P_L1_01 (BgnX, BgnY, EndX, EndY) ; 

Times += (_cycles () - Time) - Overhead; 

++Lines; 

} 

for ( ; OffY > -10; —OffY ) 
{ 

BgnX = 3 * OffX; BgnY = 3 * OffY; 
EndX = BgnX + OffX; EndY = BgnY + OffY; 
Time = _cycles ( ) ; 
P_L1_01 (BgnX, BgnY, EndX, EndY); 
Times += (_cycles () - Time) - Overhead; 
-l-+Lines; 

BgnX = 10 * OffX; BgnY = 10 * OffY; 
EndX = BgnX + OffX; EndY = BgnY + OffY; 
Time = _cycles () ; 
P_L1_01 (BgnX, BgnY, EndX, EndY) ; 
Times += (_cycles () - Time) - Overhead; 
++Lines; 
} 

for ( ; OffX > -10; —OffX ) 

{ 

BgnX = 3 * OffX; BgnY = 3 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles () ; 

P_L1_01 (BgnX, BgnY, EndX, EndY) ; 

Times += (_cycles () - Time) - Overhead; 

++Lines; 

BgnX = 10 * OffX; BgnY = 10 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles (); 

P_L1_01 (BgnX, BgnY, EndX, EndY); 

Times += (_cycles () - Time) - Overhead; 

++Lines; 

} 
for ( ; OffY < 10; ++OffY ) 
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{ 

BgnX = 3 * OffX; BgnY = 3 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles () ; 

P_L1_01 (BgnX, BgnY, EndX, EndY) ; 

Times += (_cycles () - Time) - Overhead; 

++Lines; 

BgnX = 10 * OffX; BgnY = 10 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles () ; 

P_L1_01 (BgnX, BgnY, EndX, EndY) ; 

Times += (_cycles () - Time) - Overhead; 

++Lines; 



= 3 * OffX; BgnY = 3 
= BgnX + OffX; EndY 
= _cycles ; 
01 (BgnX, BgnY, 



* OffY; 
BgnY + OffY; 



(_cycles 



EndX, EndY) ; 

- Time) - Overhead; 



for ( ; OffX < 0; ++OffX ) 
{ 

BgnX 

EndX 

Time 

P_L1_ 

Times += 

++Lines; 

BgnX = 10 * OffX; BgnY = 10 * OffY; 
EndX = BgnX + OffX; EndY = BgnY + OffY; 
Time = _cycles () ; 
P_L1_01 (BgnX, BgnY, EndX, EndY); 
Times += (_cycles () - Time) - Overhead; 
++Lines; 
} 






The total time measurement and average time is 
reported, then the bit-map is compressed and dumped 
to a file. 



fprintf 
Times = 
fprintf 
DumpMap 
return; 
} 



(Report, "%u cycles". Times) ; 

(Times + Lines / 2) / Lines; 

(Report, " (%u per segment) \n". Times); 

(BitMap, 256, 256, 2, 256 * 4, "DT Ll_01.02"); 
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/*- 



void 

T LI 02 01 



/* single line with P_L1_02 



/* <- timing overhead 
/* <- report output file 






( 

unsigned int Overhead, 

FILE * Report 

) 

/* *\ 

Runs a simple test of "P_L1_02" with a single, fixed line. 
Parameters: 



Overhead 



Report 



overhead associated with the timing 
measurement; should be subtracted 
from the time for each repetition 
of the function being timed. 

stream pointer for the file to which 
reports are to be written. 



Return : 



/* base of graphics memory */ 
/* function time counter */ 
/* memory clear counter */ 



unsigned char * Base; 

unsigned int Time; 

unsigned int Count; 

fprintf (Report, "Ll_02.01 : ") ; 

/* 

** The following parameters are initialized for use 

** by P_L1_02. 

*/ 



/* 32 planes 



G2 9K_Params.pxl_value = OxFFFFFFFFL; 
G29K_Params.mem_width = 256 * 4; 
G29K_Params.mem_depth = 2; 
G29K_Params.wnd_base = BM_CL_LLC; 
G29K_Params.wnd_align = 0; 
G2 9 K_P a r ams . wnd_mi n_x = ; 
G2 9K_P a rams . wnd_max_x = 255; 
G2 9 K_P a r ams . wnd_mi n_y = ; 
G2 9K_P a rams . wnd_max_y = 255; 



** The following parameters are initialized but are 
** -N-O-T- used by P_L1_02. 
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G29K 


Pa rams 


G29K 


Params 


G29K 


Pa rams 


G29K 


Params 


G29K 


Params 


G2 9K 


Params 


G2 9K 


Params 


G29K 


Params 


G29K 


Params 


G29K 


Params 



.pxl_op_vec = NULL; 
.pxl_in__mask = OxFFFFFFFFL; 
.pxl_do_mask = OxOOOOOOOOL; 
.pxl_do_value = OxOOOOOOOOL; 
.pxl_out_mask = OxFFFFFFFFL; 
. wid_actual =1; 
.pxl_op_code = 0; 
.mem_base = BitMap; 
. wnd_origin_x =0; 
.wnd_origin_y = 255; 



** The bit-map memory is cleared. 
*/ 

Base = BitMap; 

Count = 256 * 256 * 4; 

while ( Count — ) 

*Base++ = 0; 

/* 

** The vector is drawn and the timing measurement is 

** taken. 

*/ 

Time = _cycles (); 

P_L1_02 (20, 20, 65, 54); 

Time = (_cycles () - Time) - Overhead; 

/* 

** The time measurement is reported and the bit-map 

** is compressed and dumped to a file. 

*/ 

f print f (Report, "%u cycles \n". Time) ; 

DumpMap (BitMap, 256, 256, 2, 256 * 4, "DT_L1_02 . 01") ; 

return; 



/* 

void 

T LI 02 02 



/* all 10-pixel lines with P_L1_02 



V 



( 

unsigned int 

FILE * 



Overhead, 
Report 



/* <- timing overhead */ 
/* <- report output file */ 
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Runs a test of ^^P_L1_02" using all possible segments that 
are ten pixels long. These are drawn in six concentric 
rings around the center of the bit-map. 



-*\ 



Parameters : 



Overhead 



Report 



Return : none 



overhead associated with the timing 
measurement; should be subtracted 
from the time for each repetition 
of the function being timed. 

stream pointer for the file to which 
reports are to be written. 



\*- 



unsigned 


char 


* Base; 


unsigned 


int 


Time; 


unsigned 


int 


Times 


unsigned 


int 


Lines 


unsigned 


int 


BgnX; 


unsigned 


int 


BgnY; 


unsigned 


int 


EndX; 


unsigned 


int 


EndY; 


int 




OffX; 


int 




OffY; 


unsigned 


int 


Count 



/* base of graphics memory */ 

/* function time counter */ 

/* sum of individual times */ 

/* no. of lines drawn */ 

/* x-coord of begin point */ 

/* y-coord " " " */ 

/* x-coord of end point */ 

/* y-coord " " " */ 

/* x-coord of offset */ 

/* y-coord " " */ 

/* memory clear counter */ 



fprintf (Report, "Ll_02.02 : ") ; 



** The following parameters are initialized for use 
** by P_L1_02. 



G29K_Params.pxl_value = OxFFFFFFFFL; 
G29K_Params.mem_width = 256 * 4; 
G29K_Params.mem_depth =2; 
G29K_Params.wnd_base = BM__CL_MID; 
G29K_Params.wnd_align = 0; 
G29K_Params.wnd_min_x = -128; 
G29K__Params.wnd_max_x = 127; 
G29K__Params.wnd_min_y = -128; 
G29K_Params.wnd_max_y =127; 



/* 32 planes 



*/ 



/* 

** The following parameters are initialized but are 

** -N-O-T- used by P_L1_02. 

*/ 
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G29K_Params.pxl_op_vec = NULL; 
G2 9K_Params.pxl_in_mask = OxFFFFFFFFL; 
G29K_Params.pxl__do_mask = OxOOOOOOOOL; 
G29K_Params.pxl_do_value = OxOOOOOOOOL; 
G29K_Params.pxl_out_mas}c = OxFFFFFFFFL; 
G29K_Params.wid_actual = 1; 
G2 9K_P a rams . pxl_op_code = ; 
G29K_Params.mem_base = BitMap; 
G29K_Params.wnd_origin_x == 128; 
G29K_Params.wnd_origin_y = 127; 

/* 

** The bit-map memory is cleared. 

*/ 

Base = BitMap; 

Count = 256 * 256 * 4; 

while ( Count — ) 

*Base++ = 0; 

/* 

** Each possible lO-pixel long vector is drawn once, in a 

** clockwise direction around the center of the bit-map. 

** Cycles are counted for each vector and accumulated. 

** The number of vectors is also counted. 



V 



Times = 0; Lines = 0; 

for ( OffX = , OffY = 10; OffX < 10; ++OffX ) 

{ 

BgnX = 3 * OffX; BgnY = 3 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles () ; 

P_L1_02 (BgnX, BgnY, EndX, EndY) ; 

Times += (_cycles () - Time) - Overhead; 

++Lines; 

BgnX = 10 * OffX; BgnY = 10 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles () ; 

P_L1_02 (BgnX, BgnY, EndX, EndY) ; 

Times += (_cycles () - Time) - Overhead; 

++Lines; 



for ( ; OffY > -10; —OffY ) 
{ 

BgnX = 3 * OffX; BgnY = 3 * OffY; 
EndX = BgnX + OffX; EndY = BgnY + OffY; 
Time = _cycles (); 
P_L1_02 (BgnX, BgnY, EndX, EndY) ; 
Times += (_cycles () - Time) - Overhead; 
++Lines; 

BgnX = 10 * OffX; BgnY = 10 * OffY; 
EndX = BgnX + OffX; EndY = BgnY + OffY; 
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Time = _cycles () ; 

P_L1_02 (BgnX, BgnY, EndX, EndY) ; 

Times += (_cycles () - Time) - Overhead; 

++Lines; 



for ( ; OffX > -10; 



-OffX ) 



^ OffY; 
BgnY + OffY; 



EndX, EndY) ; 

Time) - Overhead; 



BgnX = 3 * OffX; BgnY = 3 

EndX = BgnX + OffX; EndY = 

Time = _cycles (); 

P_L1_02 (BgnX, BgnY, 

Times += (_cycles () 

++Lines; 

BgnX = 10 * OffX; BgnY = 10 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles () ; 

P_L1_02 (BgnX, BgnY, EndX, EndY) ; 

Times += (_cycles () - Time) - Overhead; 

++Lines; 

} 



for ( 



OffY < 10; ++OffY ) 



* OffY; 
BgnY + OffY; 



BgnX = 3 * OffX; BgnY = 3 

EndX = BgnX + OffX; EndY = 

Time = _cycles () ; 

P_L1_02 (BgnX, BgnY, EndX, EndY) ; 

Times += (_cycles () - Time) - Overhead; 

++Lines; 

BgnX = 10 * OffX; BgnY = 10 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles (); 

P_L1_02 (BgnX, BgnY, 

Times += (_cycles () 

++Lines; 



EndX, EndY) ; 

Time) - Overhead; 



for ( 



OffX < 0; ++OffX ) 



BgnX = 3 * OffX; BgnY = 3 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles () ; 

P_L1_02 (BgnX, BgnY, EndX, EndY) ; 

Times += (_cycles () - Time) - Overhead; 

++Lines; 

BgnX = 10 * OffX; BgnY = 10 * OffY; 

EndX = BgnX + OffX; EndY = BgnY + OffY; 

Time = _cycles (); 

P_L1_02 (BgnX, BgnY, 

Times += (_cycles () 

++Lines; 

} 



EndX, EndY) ; 

- Time) - Overhead; 
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^/ 



The total time measurement and average time is 
reported, then the bit-map is compressed and dumped 
to a file. 



fprintf (Report, 

Times = (Times + 

fprintf (Report, 

DumpMap (BitMap, 
return; 
} 

/* 



"%u cycles". Times) ; 

Lines / 2) / Lines; 

" (%u per segment) \n". Times); 

256, 256, 2, 256 * 4, "DT LI 02.02"), 



void 

T LI 02 03 



/* clipped pentagram 



unsigned int 
FILE * 



Overhead, 
Report 



/* <- timing overhead */ 
/* <- report output file */ 



Runs a test of "P_L1_02" drawing two pentagrams. The 
larger one will be clipped while the smaller will be 
centered inside the larger. 



Parameters : 
Overhead 

Report 
Return: n 



overhead associated with the timing 
measurement; should be subtracted 
from the time for each repetition 
of the function being timed. 

stream pointer for the file to which 
reports are to be written. 



unsigned char * Base; 
unsigned int Time; 
unsigned int Times; 
unsigned int Count; 
fprintf (Report, "Ll_02.03 



/* base of graphics memory */ 

/* function time counter */ 

/* sum of individual times */ 

/* memory clear counter */ 



106 



Graphics Primitives 



** The following parameters are initialized for use 
** by P_L1_02. 



G2 9K_Params . pxl_value 
G2 9K_Params .mem_width 
G2 9K_Params .mem_depth 
02 9K Params.wnd base = 



= OxFFFFFFFFL; 
= 256 * 4; 
= 2; 
BM CL MID; 



/* 32 planes 



G2 9K_Params.wnd_align = 0; 

G2 9K_P a rams . wnd_min_x = - 1 2 8 ; 

G2 9 K_P a r ams . wnd_ma x_x = 127; 

G2 9K_P a rams . wnd_min_y = -128; 

G2 9K_P a rams . wnd_max_y = 127; 



/^ 



** The following parameters are initialized but are 

** -N-O-T- used by P_L1_02 . 

*/ 

G2 9 K_P a r ams . px l_op_ve c = NUL L ; 

G2 9K_Params.pxl_in_mask = OxFFFFFFFFL; 

G29K_Params.pxl_do_mask = 0x00 000 OOOL; 

G29K_Params.pxl_do_value = OxOOOOOOOOL; 

G2 9K_Params.pxl_out_mask = OxFFFFFFFFL; 

G2 9K_Params.wid_actual = 1; 

G2 9K__Params .pxl_op_code = 0; 

G2 9K_Params.mem_base = BitMap; 

G29K_Params.wnd_origin_x = 128; 

G2 9K_Params.wnd_origin_y = 127; 

/* 

** The bit-map memory is cleared. 

*/ 



Base = BitMap; 
Count = 256 * 256 * 
while ( Count — ) 

*Base++ = 0; 



4; 






The pentagrams are drawn. A timing measurment is 
taken for each vector and accumulated. 



Times = 0; 

Time = _cycles () ; 

P_L1_02 (-28, -39, 0, 48); 

Times += (_cycles () - Time) - Overhead; 

Time = _cycles () ; 

P_L1_02 (0, 48, 28, -39); 

Times += {_cycles () - Time) - Overhead; 

Time = _cycles () ; 
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P_L1_02 (28, -39, -46, 15); 
Times += (_cycles () - Time) - Overheads- 
Time = _cycles 0; 
P_L1_02 (-46, 15, 46, 15); 
Times += (_cycles () - Time) - Overhead; 
Time = _cycles () ; 
P_L1_02 (46, 15, -28, -39); 
Times += (_cycles () - Time) - Overhead; 
Time = _cycles () ; 
P_L1_02 (-113, -155, 0, 192); 
Times += (_cycles () - Time) - Overhead; 
Time = _cycles () ; 
P_L1_02 (0, 192, 113, -155); 
Times += (_cycles () - Time) - Overhead; 
Time = _cycles () ; 
P_L1_02 (113, -155, -183, 59); 
Times += (_cycles () - Time) - Overhead; 
Time = _cycles (); 
P_L1_02 (-183, 59, 183, 59); 
Times += (_cycles () - Time) - Overhead; 
Time = _cycles (); 
P_L1_02 (183, 59, -113, -155); 
Times += (_cycles () - Time) - Overhead; 

/* 

** The total time measurement and average time is 

** reported, then the bit-map is compressed and dumped 

** to a file. 

V 

fprintf (Report, "%u cycles". Times) ; 

Times = (Times + 5) / 10; 

fprintf (Report, " (%u per segment) \n". Times); 

DumpMap (BitMap, 256, 256, 2, 256 * 4, "DT_L1_02 . 03") ; 

return; 

} 

/**•********••••**•******•****•****••*•***********••***••* 
/* end of test__ll.c */ 
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S M1 01 .S 



.title "C 2 9000 Graphics Benchmarks" 

.sbttl "Translate Pixel Coords to Linear Address" 

* s_ml_01.s C 2 9000 Graphics Benchmarks ** 

* ** 

* Copyright 1988 Advanced Micro Devices, Inc. ** 

* Written by Gibbons and Associates, Inc. ** 



* An internal subroutine to translate signed integer, pixel ** 

* coordinates to a linear address. ** 



. include 



"g2 9k_reg.h" 



.e^ect 

.sbttl "Coordinates to Address Computation" 

* Definitions ** 



+ Global Functions 
+ 

+ 



++ 
++ 
++ 



.sect 
.use 



GRAPHX, text 
GRAPHX 



.global 



S Ml 01 



S Ml 01: 



I I 



Translates signed integer, pixel coordinates to a linear 
address, according to the current control parameters. 



Parameters: 
LP.loc.x 
LP.loc.y 
GP .mem. width 



x-coordinate of pixel position 

y-coordinate of pixel position 

width, in bytes, of raster (assumed 
less than or equal to 65536) 



GP. mem. depth depth code for raster 
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GP .wnd.base 
GP .wnd. align 



-1 - one bit per pixel (monochrome) 

■- up to eight bits per pixel 

1 - up to sixteen bits per pixel 

2 - up to thirty-two bits per pixel 

linear address of the window origin 

pixel alignment of window origin (only- 
used for monochrome raster) 



Products: 

LP.loc.addr linear address of pixel position 

LP. loc. align pixel alignment (used for monochrome) 

Return: none 



.reg Tempo, tO 



const 

mtsr 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mul 

mfsr 

mtsrim 

extract 

jmpt 

subr 

sll 

add 

add 

and 

jmpi 



LP 
LP 
LP 
LP 
LP 



LP.loc.addr, 

q, GP. mem. width 

LP.loc.addr, LP. loc. 

LP.loc.addr, LP. loc. 

LP.loc.addr, LP. loc. 

LP.loc.addr, LP. loc. 

LP.loc.addr, LP. loc. 

LP.loc.addr, LP. loc. 

LP . loc . addr , LP . loc . 

LP. loc. addr, 

LP.loc.addr, 

LP . loc. addr, 

LP .loc. addr, 

LP.loc.addr, LP. loc. 

LP.loc.addr, LP. loc. 

LP.loc.addr, LP. loc. 

LP.loc.addr, LP. loc. 

LP.loc.addr, LP. loc. y, 

LP.loc.addr, LP. loc. y, 

LP.loc.addr, LP. loc. y, 

TempO, q 

fc, 18 

LP.loc.addr, LP.loc.addr, Tempo 

GP .mem. depth, $01 

LP.loc.addr, LP.loc.addr, 

Tempo, LP. loc. X, GP. mem. depth 

LP.loc.addr, LP.loc.addr, Tempo 

LP.loc.addr, GP. wnd. base, LP.loc.addr 

LP. loc. align, LP.loc.addr, 3 

ret 



• y 

• y 

• y 

• y 
.y 

• y 

• y 

LP. loc. y 
LP .loc.y 
LP. loc. y 
LP .loc.y 

y 
y 
y 
y 



LP . loc . addr 
LP . loc . addr 
LP . loc . addr 
LP . loc. addr 
LP . loc . addr 
LP . loc . addr 
LP . loc . addr 
loc . addr 
loc . addr 
loc . addr 
loc. addr 
loc . addr 
LP . loc . addr 
LP . loc . addr 
LP . loc . addr 
LP . loc . addr 
LP . loc . addr 
LP . loc . addr 
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sll LP. loc. align, LP. loc. align, 3 

$01: 

add Tempo, LP.loc.x, GP.wnd. align 

and LP. loc. align. Tempo, 31 

sra Tempo, TempO, 5 

sll Tempo, Tempo, 2 

add LP.loc.addr, LP.loc.addr, TempO 

jmpi ret 

add LP.loc.addr, GP.wnd. base, LP.loc.addr 

.*•*••**************•**••••*•*•********•*•**•*••*****•***••**•••** 

.end ; of s_ml_01.s 



111 



Graphics Primitives 



S CI 01 -S 



.title "C 29000 Graphics Benchmarks" 
.sbttl "Clipping Trap Vectors and Handlers" 
****************************************************************** 

* *• 

* s_cl_01.s C 29000 Graphics Benchmarks ** 

* ** 

* Copyright 1988 Advanced Micro Devices, Inc. ** 

* Written by Gibbons and Associates, Inc. ** 

* *• 

* ** 

* Function to set the clipping vectors to the clipping trap ** 

* handlers. The handlers are also here. ** 

* ** 



.include "g2 9k_reg.h" 

.eject 

.sbttl "Clipping Trap Handlers" 



* ** 

* Definitions ** 



+ Local Functions 
+ 



++ 
++ 
++ 



.sect 
.use 



GRAPHX, text 
GRAPHX 



S C2 01: 



Handles the skip if clipped trap. 
Parameters: none 
Products: none 

Return: none 



add 

mtsr 

add 

mtsr 

iret 



tpc, LP .clp.skip_vec, 

pel, tpc 

tpc, LP .clp.skip_vec, 4 

pcO, tpc 
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S C2 02: 



1 Handles the stop 


if clipped trap. | | 


1 Parameters : 


none 1 I 


1 Products: 


none 1 I 


1 Return : none 





add 

mtsr 

add 

mtsr 

iret 



tpc, LP.clp.stop_vec, 

pel, tpc 

tpc, LP.clp.stop_vec, 4 

pcO, tpc 



.eject 

. sbttl "Set Clipping Vectors" 



; + 

;+ Global Functions 

; + 



++ 
++ 
++ 



.global 



GRAPHX 
S CI 01 



S CI 01: 



Sets the clipping trap vectors to their handlers. 



Parameters : 



Products : 



Return : none 



I I 



I I 



113 



Graphics Primitives 



.reg 


TempO, 


to 


.reg 


Tempi, 


tl 


const 


Tempo, 


V CLIP SKIP 


sll 


Tempo, 


Tempo, 2 


const 


Tempi, 


S C2 01 


consth 


Tempi, 


S C2 01 


store 


0, 0, 


Tempi, Tempo 


const 


Tempo, 


V CLIP STOP 


sll 


TempO, 


Tempo, 2 


const 


Tempi, 


S C2 02 


consth 


Tempi, 


S_C2_02 


jmpi 


ret 




store 


0. 0, 


Tempi, Tempo 



^. ***************************************************************** 
.end ; of s_cl_01.s 
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Additional performance considerations, 73 
Hardware assists, 74 

Bit-nnap clearing, 75 

Hardware cycles, 76 

Saturation in hardware, 76 

VRAM used for bit maps, 74 
Pipelines, 73 
Scaled arithmetic, 73 

Am29000 graphics performance, 73 
Axial half-width, 17 

B 

Benchmark results, 58 

BITBLT, 61 

BITBLT 16x16 color functions, 63, 64 

BITBLT 16x16 monochrome functions, 61 , 62 

BITBLT 256x256 color functions, 67, 68 

BITBLT 256x256 nrx)nochrome functions, 65, 66 

Filled triangles, 69 

Monochrome triangle functions, 72 

Shaded triangle functions, 69 

Single-width line functions, 59 

Solid direct triangle functions, 70 

Solid XOR triangle functions, 71 

Test functions, 69 

Text, 69 

Vectors, 58 

Wide/AA line functions, 60 
Benchmarks, 57 

Hardware models, 58 

Summary, 73 

Bibliography, 77 

B|t assignments for first word of rasterized character 
format, 52 

Bit map, 2, 3 

Monochrome, 2 
Bit maps, 2 
Bit-map clearing, 75 
BITBLT, 61 

BITBLT 1 6x1 6 color functions, 63, 64 
BITBLT 16x16 nrx)nochrome functions, 61, 62 
BITBLT 256x256 color functions, 67, 68 
BITBLT 256x256 monochrome functions, 65, 66 
Block size values, 31 
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Clipping Tests, 24 

Common files, 3 
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Copy block routines, 31 

P_B1_01.S,31 

P_B1_02.S, 32 

P_B2_01.S,34 

P_B2_02.S, 36 

P_B3_01.S,38 

P_B3_02.S, 41 

P_B4_01.S,45 
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VRAM used for bit maps, 74 
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Hardware saturation, 76 
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Linker command file stepi .cmd, 58 
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